22-Sep-2013 – SYSMGR in the attic

Machine check
In publishing the images of our Moroccan holiday this year, something had gone wrong in generating the album and I didn’t notice until I published what I thought to be the right data. On checking, I noticed the error, and yesterday evening I found what went wrong and would repair it. But here something else stroke me: I was missing something in the logs. First, I thought it had been a power outage, but since there was no other sign of such an event: every electric clock would notify outage by blinking but none did, so Diana must have had a reboot for some reason.
So I checked, and indeed:$ ana/crash


OpenVMS system dump analyzer

...analyzing an Alpha compressed selective memory dump...
Dump taken on 21-SEP-2013 08:13:04.29 using version V8.3

MACHINECHK, Machine check while in kernel mode

SDA>

Checking the error log, this is what is says around the time of the crash:
**** V3.4 ********************* ENTRY 3791 ********************************


Logging OS                        1. OpenVMS

System Architecture               2. Alpha;

OS version                           V8.3

Event sequence number         47555.

Timestamp of occurrence              21-SEP-2013 08:13:04

Time since reboot                    276 Day(s) 20:42:02

Host name                            DIANA
System Model                         AlphaServer DS10 617 MHz
Entry Type                        2. Machine Check Error
========================

Raw Event Data Dump

========================
Entry# (record in file)        3791.
Entry Body Size:          x0000016C

Entry body:

15--< -12 11--<-08 07--<-04 03--<-00 :Byte Order 0000: 00000000 000B0000 0022FFF9 000007B2 *......".........* 0010: 00000000 00000020 2020414E 41494408 *.DIANA .......* 0020: B9C300AD 9A2ADC19 68200002 60030000 *...`.. h..*.....* 0030: 016D011A 00000000 20202020 332E3856 *V8.3 ......m.* 0040: 31534420 72657672 65536168 706C4118 *.AlphaServer DS1* 0050: 00000000 0000007A 484D2037 31362030 *0 617 MHz.......* 0060: 000000A0 00000018 00000000 000000C8 *................* 0070: 00000000 00000000 00000001 00000098 *................* 0080: 00000000 122CCC00 00000000 00000000 *..........,.....* 0090: 00000000 00000000 00000000 00000000 *................* 00A0: 00000000 00000009 00000000 00000001 *................* 00B0: FFFFFFFF 81F3AC00 00000000 00000290 *................* 00C0: 00000000 00000000 0000006E FFC00000 *....n...........* 00D0: 00000000 00008000 00000000 00000000 *................* 00E0: 00000000 00000000 FFFFFEFC 1A300386 *..0.............* 00F0: 00000000 00000000 00000000 00000000 *................* 0100: 00000000 00000000 00000000 00000004 *................* 0110: 00000000 00000000 00000000 00000000 *................* 0120: 00010007 000C0038 00000000 00000000 *........8.......* 0130: 00000000 00000000 00000000 00006005 *.`..............* 0140: 00000000 00000000 00000000 00000000 *................* 0150: 0000010A 00000000 00001244 014C0091 *..L.D...........* 0160: 00000000 00010000 00000008 * ............* **** V3.4 ********************* ENTRY 3792 ******************************** Logging OS 1. OpenVMS System Architecture 2. Alpha OS version V8.3 Event sequence number 47556. Timestamp of occurrence 21-SEP-2013 08:13:04 Time since reboot 276 Day(s) 20:42:02 Host name DIANA System Model AlphaServer DS10 617 MHz Entry Type 37. Crash Re-Start Bugcheck Minor class 1. Crash Re-start Bugcheck Msg MACHINECHK, Machine check while in kernel mode Process ID x00010000 Process Name NULL KSP xFFFFFFFF854CFD80 ESP xFFFFFFFF854D1000 SSP xFFFFFFFF854CB000 USP xFFFFFFFF854CB000 R0 x0000000000000000 R1 x0000000000000210 R2 xFFFFFFFF81C3FE50 R3 xFFFFFFFF81C402F0 R4 x0000000000000080 R5 x0000000000000000 R6 xFFFFFFFF82036038 R7 x0000000000000038 R8 x0000000000000000 R9 x0000000000000000 R10 x0000000000000002 R11 xFFFFFFFF81C0B1C8 R12 x000000007BBBBBC8 R13 xFFFFFFFF81C57110 R14 x0000000000000001 R15 x0000000000000000 R16 x0000000000000215 R17 x0000000000000002 R18 xFFFFFFFF82032004 R19 x8203208000000000 R20 x8203228000000000 R21 x8203208000000000 R22 x0000000000000000 R23 xFFFFFFFF81C0A0F8 R24 x0000000000000000 R25 x0000000000000001
This means deep digging, but there is little chance to find out what exactly has gone wrong - after being available for over 270 days - without access to the source code or (better!) listings. However: the system restarted flawlessly, and all works like it should. No wonder I didn't notice; if it were not that I missed something in the logfiles, I wouldn't have noticed for a long time it had happened...
Port 2525 abused?
The router has been set up to block certain traffic, and this time there is definitively something weird going on. I get these message:
Event Time : 2013/09/22 13:53:44, [CSM_AE] [eDonkey] [Block] Packet info: 192.168.0.200:2525 -> 208.75.123.163:46328, PR tcp -AP-----------------------------------------------
and
Event Time : 2013/09/22 12:16:31, [CSM_AE] [eDonkey] [Block] Packet info: 192.168.0.200:2525 -> 87.106.96.232:57343, PR tcp -AP-----------------------------------------------
Just these two addresses…
Port 2525 is the PMAS port – incoming mail – that has been set to be accepting mail for my domain only, disallowing relay, for obvious reasons. So what causes a torrent-like behaviour on this port?
It look working – though stated “Disabled” it actulally isn’t:
$ tcpip sho serv


Service             Port  Proto    Process          Address            State

BE-SMTP             2525  TCP      PTSMTP           0.0.0.0            Disabled
because PMAS connects directly to this port.
There have been more strange issues with mail, lately.
The logfile does sometimes show messages like:%%%%%%%%%%%  OPCOM   5-SEP-2013 08:42:42.53  %%%%%%%%%%% Message from user WILLEM on DIANA %PTSMTP-E-WORKERDIED, worker PTSMTP    0056  (21265CA4) terminated unexpectedly -PTSMTP-I-WORKEREXIT, worker exiting -PTSMTP-E-SOCKETFAILED, failed to create socket - -SYSTEM-W-DEVALLOC, device already allocated to another user -PTSMTP-I-WORKERCONN, while processing connection from <address>,<port>
for no reason (this may happen if there has been a bulk of messages before this message but here, it was rather quiet), and some days, I find:
%%%%%%%%%%%  OPCOM   5-SEP-2013 11:32:35.30  %%%%%%%%%%% Message from user TCPIP$SMTP on DIANA %TCPIP-W-SMTP_NOSPAMRLY, Rejected relay to  <user@domain> from client IP address 10.0.0.1 as suspected SPAM
but though this address is set up on my second network interface, it has been disabled:
$ tcpip sho int                                                              Packets Interface   IP_Addr         Network mask          Receive          Send     MTU


 LO0        127.0.0.1       255.0.0.0              330183        330183    4096

 WE0        192.168.0.2     255.255.255.0         3620928       2038155    1500

 WE1        ?                                           1             1    1500

$ tcpip sho conf

_Facility: inter

_PARAM1:
 Interface: LO0

   IP_Addr: 127.0.0.1         NETWRK: 255.0.0.0         BRDCST:
 Interface: WE0

   IP_Addr: 192.168.0.2       NETWRK: 255.255.255.0     BRDCST: 192.168.0.255
 Interface: WE1

   IP_Addr: 10.0.0.1          NETWRK: 255.0.0.0         BRDCST: 10.255.255.255
$ tcpip ifconf -a

LO0: flags=100c89<up ,LOOPBACK,NOARP,MULTICAST,SIMPLEX,NOCHECKSUM>

     inet 127.0.0.1 netmask ff000000 ipmtu 4096
TN0: flags=80<noarp>
TN1: flags=80</noarp><noarp>
WE0: flags=c43<up ,BROADCAST,RUNNING,MULTICAST,SIMPLEX>

    *inet 192.168.0.2 netmask ffffff00 broadcast 192.168.0.255 ipmtu 1500

     inet 192.168.0.200 netmask ffffff00 broadcast 192.168.0.255 ipmtu 1500

WE1: flags=c02<broadcast ,MULTICAST,SIMPLEX> *inet 10.0.0.1 netmask ff000000 broadcast 10.255.255.255 ipmtu 1500</broadcast></up></noarp></up>
I didn’t see this before, it is just the last week or so that this happens. Nothing particular is found in the PMAS log files for these addresses, so I turned to the most reliable source of traffic of incoming and outgoing traffic:the router/firewall log. And there it was: Sep 22 02:56:59 charon.intra.grootersnet.nl Charon2: [FILTER][Pass][WAN->LAN/RT/VPN, 414:40:37 ][@S:R=13:1, 208.75.123.163:60401->192.168.0.200:2525][TCP][HLen=20, TLen=60, Flag=S, Seq=4070629722, Ack=0, Win=5840] Sep 22 02:56:59 charon.intra.grootersnet.nl Charon2: [FILTER][Pass][LAN/RT/VPN->WAN, 414:40:37 ][@S:R=13:1, 192.168.0.200:2525->208.75.123.163:60401][TCP][HLen=20, TLen=48, Flag=AS, Seq=3896561846, Ack=4070629723, Win=16824] Sep 22 02:57:03 charon.intra.grootersnet.nl Charon2: [CSM_AE][Block][eDonkey][@S:R=13:1, 192.168.0.200:2525->208.75.123.163:60401][TCP][HLen=20, TLen=148, Flag=AP, Seq=3896562051, Ack=4070629856, Win=16824]
Sure, from the address 208.75.123.163 there has been a connection to port 2525 – which is PMAS, and there is a first reply almost immediately. But a second reply is blocked by the router, due to it’s signature (I don’t allow torrents and bulk downloads from my server). I just have to guess what they might have sent to the server – by mail – causing PMAS to respond in that way, there is no way to tell: the firewall log doesn’t tell, and PMAS doesn’t point to anything. After latest reboot, the worker logs were not enabled automatically so there is no data on that – yet.
The same applies to the second address: 87.106.96.232. Neither of these can really be tracked down to a particular provider. So I had to ban them from the network.

September 2013
M	T	W	T	F	S	S
« Aug				Oct »
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30