Machine check
In publishing the images of our Moroccan holiday this year, something had gone wrong in generating the album and I didn’t notice until I published what I thought to be the right data. On checking, I noticed the error, and yesterday evening I found what went wrong and would repair it. But here something else stroke me: I was missing something in the logs. First, I thought it had been a power outage, but since there was no other sign of such an event: every electric clock would notify outage by blinking but none did, so Diana must have had a reboot for some reason.
So I checked, and indeed:
$ ana/crash
OpenVMS system dump analyzer
...analyzing an Alpha compressed selective memory dump...
Dump taken on 21-SEP-2013 08:13:04.29 using version V8.3
MACHINECHK, Machine check while in kernel mode
SDA>
Checking the error log, this is what is says around the time of the crash:
**** V3.4 ********************* ENTRY 3791 ********************************
Logging OS 1. OpenVMS
System Architecture 2. Alpha;
OS version V8.3
Event sequence number 47555.
Timestamp of occurrence 21-SEP-2013 08:13:04
Time since reboot 276 Day(s) 20:42:02
Host name DIANA
System Model AlphaServer DS10 617 MHz
Entry Type 2. Machine Check Error
========================
Raw Event Data Dump
========================
Entry# (record in file) 3791.
Entry Body Size: x0000016C
Entry body:
15--< -12 11--<-08 07--<-04 03--<-00 :Byte Order
0000: 00000000 000B0000 0022FFF9 000007B2 *......".........*
0010: 00000000 00000020 2020414E 41494408 *.DIANA .......*
0020: B9C300AD 9A2ADC19 68200002 60030000 *...`.. h..*.....*
0030: 016D011A 00000000 20202020 332E3856 *V8.3 ......m.*
0040: 31534420 72657672 65536168 706C4118 *.AlphaServer DS1*
0050: 00000000 0000007A 484D2037 31362030 *0 617 MHz.......*
0060: 000000A0 00000018 00000000 000000C8 *................*
0070: 00000000 00000000 00000001 00000098 *................*
0080: 00000000 122CCC00 00000000 00000000 *..........,.....*
0090: 00000000 00000000 00000000 00000000 *................*
00A0: 00000000 00000009 00000000 00000001 *................*
00B0: FFFFFFFF 81F3AC00 00000000 00000290 *................*
00C0: 00000000 00000000 0000006E FFC00000 *....n...........*
00D0: 00000000 00008000 00000000 00000000 *................*
00E0: 00000000 00000000 FFFFFEFC 1A300386 *..0.............*
00F0: 00000000 00000000 00000000 00000000 *................*
0100: 00000000 00000000 00000000 00000004 *................*
0110: 00000000 00000000 00000000 00000000 *................*
0120: 00010007 000C0038 00000000 00000000 *........8.......*
0130: 00000000 00000000 00000000 00006005 *.`..............*
0140: 00000000 00000000 00000000 00000000 *................*
0150: 0000010A 00000000 00001244 014C0091 *..L.D...........*
0160: 00000000 00010000 00000008 * ............*
**** V3.4 ********************* ENTRY 3792 ********************************
Logging OS 1. OpenVMS
System Architecture 2. Alpha
OS version V8.3
Event sequence number 47556.
Timestamp of occurrence 21-SEP-2013 08:13:04
Time since reboot 276 Day(s) 20:42:02
Host name DIANA
System Model AlphaServer DS10 617 MHz
Entry Type 37. Crash Re-Start
Bugcheck Minor class 1. Crash Re-start
Bugcheck Msg MACHINECHK, Machine check while in kernel mode
Process ID x00010000
Process Name NULL
KSP xFFFFFFFF854CFD80
ESP xFFFFFFFF854D1000
SSP xFFFFFFFF854CB000
USP xFFFFFFFF854CB000
R0 x0000000000000000
R1 x0000000000000210
R2 xFFFFFFFF81C3FE50
R3 xFFFFFFFF81C402F0
R4 x0000000000000080
R5 x0000000000000000
R6 xFFFFFFFF82036038
R7 x0000000000000038
R8 x0000000000000000
R9 x0000000000000000
R10 x0000000000000002
R11 xFFFFFFFF81C0B1C8
R12 x000000007BBBBBC8
R13 xFFFFFFFF81C57110
R14 x0000000000000001
R15 x0000000000000000
R16 x0000000000000215
R17 x0000000000000002
R18 xFFFFFFFF82032004
R19 x8203208000000000
R20 x8203228000000000
R21 x8203208000000000
R22 x0000000000000000
R23 xFFFFFFFF81C0A0F8
R24 x0000000000000000
R25 x0000000000000001
This means deep digging, but there is little chance to find out what exactly has gone wrong - after being available for over 270 days - without access to the source code or (better!) listings. However: the system restarted flawlessly, and all works like it should. No wonder I didn't notice; if it were not that I missed something in the logfiles, I wouldn't have noticed for a long time it had happened...
Port 2525 abused?
The router has been set up to block certain traffic, and this time there is definitively something weird going on. I get these message:
Event Time : 2013/09/22 13:53:44, [CSM_AE] [eDonkey] [Block]
Packet info: 192.168.0.200:2525 -> 208.75.123.163:46328, PR tcp
-AP-----------------------------------------------
and
Event Time : 2013/09/22 12:16:31, [CSM_AE] [eDonkey] [Block]
Packet info: 192.168.0.200:2525 -> 87.106.96.232:57343, PR tcp
-AP-----------------------------------------------
Just these two addresses…
Port 2525 is the PMAS port – incoming mail – that has been set to be accepting mail for my domain only, disallowing relay, for obvious reasons. So what causes a torrent-like behaviour on this port?
It look working – though stated “Disabled” it actulally isn’t:
$ tcpip sho serv
Service Port Proto Process Address State
BE-SMTP 2525 TCP PTSMTP 0.0.0.0 Disabled
because PMAS connects directly to this port.
There have been more strange issues with mail, lately.
The logfile does sometimes show messages like:
%%%%%%%%%%% OPCOM 5-SEP-2013 08:42:42.53 %%%%%%%%%%%
Message from user WILLEM on DIANA
%PTSMTP-E-WORKERDIED, worker PTSMTP 0056 (21265CA4) terminated unexpectedly
-PTSMTP-I-WORKEREXIT, worker exiting
-PTSMTP-E-SOCKETFAILED, failed to create socket -
-SYSTEM-W-DEVALLOC, device already allocated to another user
-PTSMTP-I-WORKERCONN, while processing connection from <address>,<port>
for no reason (this may happen if there has been a bulk of messages before this message but here, it was rather quiet), and some days, I find:
%%%%%%%%%%% OPCOM 5-SEP-2013 11:32:35.30 %%%%%%%%%%%
Message from user TCPIP$SMTP on DIANA
%TCPIP-W-SMTP_NOSPAMRLY, Rejected relay to <user@domain> from client IP address 10.0.0.1 as suspected SPAM
but though this address is set up on my second network interface, it has been disabled:
$ tcpip sho int
Packets
Interface IP_Addr Network mask Receive Send MTU
LO0 127.0.0.1 255.0.0.0 330183 330183 4096
WE0 192.168.0.2 255.255.255.0 3620928 2038155 1500
WE1 ? 1 1 1500
$ tcpip sho conf
_Facility: inter
_PARAM1:
Interface: LO0
IP_Addr: 127.0.0.1 NETWRK: 255.0.0.0 BRDCST:
Interface: WE0
IP_Addr: 192.168.0.2 NETWRK: 255.255.255.0 BRDCST: 192.168.0.255
Interface: WE1
IP_Addr: 10.0.0.1 NETWRK: 255.0.0.0 BRDCST: 10.255.255.255
$ tcpip ifconf -a
LO0: flags=100c89<up ,LOOPBACK,NOARP,MULTICAST,SIMPLEX,NOCHECKSUM>
inet 127.0.0.1 netmask ff000000 ipmtu 4096
TN0: flags=80<noarp>
TN1: flags=80</noarp><noarp>
WE0: flags=c43<up ,BROADCAST,RUNNING,MULTICAST,SIMPLEX>
*inet 192.168.0.2 netmask ffffff00 broadcast 192.168.0.255 ipmtu 1500
inet 192.168.0.200 netmask ffffff00 broadcast 192.168.0.255 ipmtu 1500
WE1: flags=c02<broadcast ,MULTICAST,SIMPLEX>
*inet 10.0.0.1 netmask ff000000 broadcast 10.255.255.255 ipmtu 1500</broadcast></up></noarp></up>
I didn’t see this before, it is just the last week or so that this happens. Nothing particular is found in the PMAS log files for these addresses, so I turned to the most reliable source of traffic of incoming and outgoing traffic:the router/firewall log. And there it was:
Sep 22 02:56:59 charon.intra.grootersnet.nl Charon2: [FILTER][Pass][WAN->LAN/RT/VPN, 414:40:37 ][@S:R=13:1, 208.75.123.163:60401->192.168.0.200:2525][TCP][HLen=20, TLen=60, Flag=S, Seq=4070629722, Ack=0, Win=5840]
Sep 22 02:56:59 charon.intra.grootersnet.nl Charon2: [FILTER][Pass][LAN/RT/VPN->WAN, 414:40:37 ][@S:R=13:1, 192.168.0.200:2525->208.75.123.163:60401][TCP][HLen=20, TLen=48, Flag=AS, Seq=3896561846, Ack=4070629723, Win=16824]
Sep 22 02:57:03 charon.intra.grootersnet.nl Charon2: [CSM_AE][Block][eDonkey][@S:R=13:1, 192.168.0.200:2525->208.75.123.163:60401][TCP][HLen=20, TLen=148, Flag=AP, Seq=3896562051, Ack=4070629856, Win=16824]
Sure, from the address 208.75.123.163 there has been a connection to port 2525 – which is PMAS, and there is a first reply almost immediately. But a second reply is blocked by the router, due to it’s signature (I don’t allow torrents and bulk downloads from my server). I just have to guess what they might have sent to the server – by mail – causing PMAS to respond in that way, there is no way to tell: the firewall log doesn’t tell, and PMAS doesn’t point to anything. After latest reboot, the worker logs were not enabled automatically so there is no data on that – yet.
The same applies to the second address: 87.106.96.232. Neither of these can really be tracked down to a particular provider. So I had to ban them from the network.