22-Sep-2013

Machine check
In publishing the images of our Moroccan holiday this year, something had gone wrong in generating the album and I didn’t notice until I published what I thought to be the right data. On checking, I noticed the error, and yesterday evening I found what went wrong and would repair it. But here something else stroke me: I was missing something in the logs. First, I thought it had been a power outage, but since there was no other sign of such an event: every electric clock would notify outage by blinking but none did, so Diana must have had a reboot for some reason.
So I checked, and indeed:
$ ana/crash

OpenVMS system dump analyzer
...analyzing an Alpha compressed selective memory dump...

Dump taken on 21-SEP-2013 08:13:04.29 using version V8.3
MACHINECHK, Machine check while in kernel mode

SDA>

Checking the error log, this is what is says around the time of the crash:

**** V3.4  ********************* ENTRY 3791 ********************************

Logging OS                        1. OpenVMS
System Architecture               2. Alpha;
OS version                           V8.3
Event sequence number         47555.
Timestamp of occurrence              21-SEP-2013 08:13:04
Time since reboot                    276 Day(s) 20:42:02
Host name                            DIANA

System Model                         AlphaServer DS10 617 MHz

Entry Type                        2. Machine Check Error

========================
Raw Event Data Dump
========================

Entry# (record in file)        3791.

Entry Body Size:          x0000016C
Entry body:

          15--<  -12  11--<-08  07--<-04  03--<-00   :Byte Order   0000:    00000000  000B0000  0022FFF9  000007B2   *......".........*   0010:    00000000  00000020  2020414E  41494408   *.DIANA   .......*   0020:    B9C300AD  9A2ADC19  68200002  60030000   *...`.. h..*.....*   0030:    016D011A  00000000  20202020  332E3856   *V8.3    ......m.*   0040:    31534420  72657672  65536168  706C4118   *.AlphaServer DS1*   0050:    00000000  0000007A  484D2037  31362030   *0 617 MHz.......*   0060:    000000A0  00000018  00000000  000000C8   *................*   0070:    00000000  00000000  00000001  00000098   *................*   0080:    00000000  122CCC00  00000000  00000000   *..........,.....*   0090:    00000000  00000000  00000000  00000000   *................*   00A0:    00000000  00000009  00000000  00000001   *................*   00B0:    FFFFFFFF  81F3AC00  00000000  00000290   *................*   00C0:    00000000  00000000  0000006E  FFC00000   *....n...........*   00D0:    00000000  00008000  00000000  00000000   *................*   00E0:    00000000  00000000  FFFFFEFC  1A300386   *..0.............*   00F0:    00000000  00000000  00000000  00000000   *................*   0100:    00000000  00000000  00000000  00000004   *................*   0110:    00000000  00000000  00000000  00000000   *................*   0120:    00010007  000C0038  00000000  00000000   *........8.......*   0130:    00000000  00000000  00000000  00006005   *.`..............*   0140:    00000000  00000000  00000000  00000000   *................*   0150:    0000010A  00000000  00001244  014C0091   *..L.D...........*   0160:              00000000  00010000  00000008   *    ............*  **** V3.4  ********************* ENTRY 3792 ********************************  Logging OS                        1. OpenVMS  System Architecture               2. Alpha OS version                           V8.3      Event sequence number         47556. Timestamp of occurrence              21-SEP-2013 08:13:04  Time since reboot                    276 Day(s) 20:42:02  Host name                            DIANA     System Model                         AlphaServer DS10 617 MHz  Entry Type                       37. Crash Re-Start  Bugcheck Minor class              1. Crash Re-start  Bugcheck Msg                         MACHINECHK, Machine check while in kernel mode  Process ID                x00010000  Process Name                         NULL  KSP                       xFFFFFFFF854CFD80  ESP                       xFFFFFFFF854D1000  SSP                       xFFFFFFFF854CB000  USP                       xFFFFFFFF854CB000  R0                        x0000000000000000  R1                        x0000000000000210  R2                        xFFFFFFFF81C3FE50  R3                        xFFFFFFFF81C402F0  R4                        x0000000000000080  R5                        x0000000000000000  R6                        xFFFFFFFF82036038  R7                        x0000000000000038  R8                        x0000000000000000  R9                        x0000000000000000  R10                       x0000000000000002  R11                       xFFFFFFFF81C0B1C8  R12                       x000000007BBBBBC8  R13                       xFFFFFFFF81C57110  R14                       x0000000000000001  R15                       x0000000000000000  R16                       x0000000000000215  R17                       x0000000000000002  R18                       xFFFFFFFF82032004  R19                       x8203208000000000  R20                       x8203228000000000  R21                       x8203208000000000  R22                       x0000000000000000  R23                       xFFFFFFFF81C0A0F8  R24                       x0000000000000000  R25                       x0000000000000001 
This means deep digging, but there is little chance to find out what exactly has gone wrong - after being available for over 270 days - without access to the source code or (better!) listings. However: the system restarted flawlessly, and all works like it should. No wonder I didn't notice; if it were not that I missed something in the logfiles, I wouldn't have noticed for a long time it had happened...
Port 2525 abused?
The router has been set up to block certain traffic, and this time there is definitively something weird going on. I get these message:

Event Time : 2013/09/22 13:53:44, [CSM_AE] [eDonkey] [Block]
Packet info: 192.168.0.200:2525 -> 208.75.123.163:46328, PR tcp
-AP-----------------------------------------------

and
Event Time : 2013/09/22 12:16:31, [CSM_AE] [eDonkey] [Block]
Packet info: 192.168.0.200:2525 -> 87.106.96.232:57343, PR tcp
-AP-----------------------------------------------

Just these two addresses…
Port 2525 is the PMAS port – incoming mail – that has been set to be accepting mail for my domain only, disallowing relay, for obvious reasons. So what causes a torrent-like behaviour on this port?
It look working – though stated “Disabled” it actulally isn’t:
$ tcpip sho serv

Service             Port  Proto    Process          Address            State

BE-SMTP             2525  TCP      PTSMTP           0.0.0.0            Disabled
because PMAS connects directly to this port.
There have been more strange issues with mail, lately.
The logfile does sometimes show messages like:
%%%%%%%%%%%  OPCOM   5-SEP-2013 08:42:42.53  %%%%%%%%%%%
Message from user WILLEM on DIANA
%PTSMTP-E-WORKERDIED, worker PTSMTP    0056  (21265CA4) terminated unexpectedly
-PTSMTP-I-WORKEREXIT, worker exiting
-PTSMTP-E-SOCKETFAILED, failed to create socket -
-SYSTEM-W-DEVALLOC, device already allocated to another user
-PTSMTP-I-WORKERCONN, while processing connection from <address>,<port>

for no reason (this may happen if there has been a bulk of messages before this message but here, it was rather quiet), and some days, I find:
%%%%%%%%%%%  OPCOM   5-SEP-2013 11:32:35.30  %%%%%%%%%%%
Message from user TCPIP$SMTP on DIANA
%TCPIP-W-SMTP_NOSPAMRLY, Rejected relay to  <user@domain> from client IP address 10.0.0.1 as suspected SPAM

but though this address is set up on my second network interface, it has been disabled:
$ tcpip sho int
                      
                      
               Packets
Interface   IP_Addr         Network mask          Receive          Send     MTU

 LO0        127.0.0.1       255.0.0.0              330183        330183    4096
 WE0        192.168.0.2     255.255.255.0         3620928       2038155    1500
 WE1        ?                                           1             1    1500
$ tcpip sho conf
_Facility: inter
_PARAM1:

 Interface: LO0
   IP_Addr: 127.0.0.1         NETWRK: 255.0.0.0         BRDCST:

 Interface: WE0
   IP_Addr: 192.168.0.2       NETWRK: 255.255.255.0     BRDCST: 192.168.0.255

 Interface: WE1
   IP_Addr: 10.0.0.1          NETWRK: 255.0.0.0         BRDCST: 10.255.255.255

$ tcpip ifconf -a
LO0: flags=100c89<up ,LOOPBACK,NOARP,MULTICAST,SIMPLEX,NOCHECKSUM>
     inet 127.0.0.1 netmask ff000000 ipmtu 4096

TN0: flags=80<noarp>

TN1: flags=80</noarp><noarp>

WE0: flags=c43<up ,BROADCAST,RUNNING,MULTICAST,SIMPLEX>
    *inet 192.168.0.2 netmask ffffff00 broadcast 192.168.0.255 ipmtu 1500
     inet 192.168.0.200 netmask ffffff00 broadcast 192.168.0.255 ipmtu 1500

WE1: flags=c02<broadcast ,MULTICAST,SIMPLEX>
    *inet 10.0.0.1 netmask ff000000 broadcast 10.255.255.255 ipmtu 1500</broadcast></up></noarp></up>

I didn’t see this before, it is just the last week or so that this happens. Nothing particular is found in the PMAS log files for these addresses, so I turned to the most reliable source of traffic of incoming and outgoing traffic:the router/firewall log. And there it was:
Sep 22 02:56:59 charon.intra.grootersnet.nl Charon2: [FILTER][Pass][WAN->LAN/RT/VPN, 414:40:37 ][@S:R=13:1, 208.75.123.163:60401->192.168.0.200:2525][TCP][HLen=20, TLen=60, Flag=S, Seq=4070629722, Ack=0, Win=5840]
Sep 22 02:56:59 charon.intra.grootersnet.nl Charon2: [FILTER][Pass][LAN/RT/VPN->WAN, 414:40:37 ][@S:R=13:1, 192.168.0.200:2525->208.75.123.163:60401][TCP][HLen=20, TLen=48, Flag=AS, Seq=3896561846, Ack=4070629723, Win=16824]
Sep 22 02:57:03 charon.intra.grootersnet.nl Charon2: [CSM_AE][Block][eDonkey][@S:R=13:1, 192.168.0.200:2525->208.75.123.163:60401][TCP][HLen=20, TLen=148, Flag=AP, Seq=3896562051, Ack=4070629856, Win=16824]

Sure, from the address 208.75.123.163 there has been a connection to port 2525 – which is PMAS, and there is a first reply almost immediately. But a second reply is blocked by the router, due to it’s signature (I don’t allow torrents and bulk downloads from my server). I just have to guess what they might have sent to the server – by mail – causing PMAS to respond in that way, there is no way to tell: the firewall log doesn’t tell, and PMAS doesn’t point to anything. After latest reboot, the worker logs were not enabled automatically so there is no data on that – yet.
The same applies to the second address: 87.106.96.232. Neither of these can really be tracked down to a particular provider. So I had to ban them from the network.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.