31-Aug-2010

No web access possible
This morning, I couldn’t reach the site over the web. Even more weird: then accessing the operator and mail sites, I got the worng certificate, the one of the router-firewall, and so the browser complaiend it was a different site…
It was only after when I got home I could try to locate the problem. Accessing the sites from the inside now didn’t succeed – all machines rendered the site inaccessable. Even accessing the router internally, using it’s web-interface, didn’t work. Just the header showed up, and the remaining data was not found.
Luckily, telnetting to the router was still possible, and that way I found a weird, second DNS mentioned, way outside the normal range but still ‘local’, on address 192.168.51.1. With no external management enabled, this is something where I need to dig deeper.
First of all, I rebooted the router – and after that, HTTP and HTTPS traffic was passed to the VMS box as usual. Next, I took a look to the logging I have enabled – and I found that the outside connection was dropped for a brief moment; followed by a number of PINGS to an external address, originating from the server a few minutes later. But there was no informtion what process issued them.

The weirdiest thing however is that it was just the web-interface having trouble. All outgoing traffic – and incoming mail was not effected at all. Nor was outgoing traffic, it seems.

I’ll have to do some digging. Luckily, I now have the router’s logfiles!

01-Aug-2010

Maintenance
Most of the standard work (collecting and saving logfiles) is now carried out by a DCL procedure. One of the things it does is getting the mail statistics of last month:

PMAS statistics for July
Total messages    : 8495 = 100.0 o/o
DNS Blacklisted   : 1807 =  21.2 o/o (Files: 31)
Relay attempts    : 5557 =  65.4 o/o (Files: 31)
Processed by PMAS : 1131 =  13.3 o/o (Files: 30)
       Discarded :  148 =  13.0 o/o (processed),   1.7 o/o (all)
    Quarantained :  344 =  30.4 o/o (processed),   4.0 o/o (all)
       Delivered :  639 =  56.4 o/o (processed),   7.5 o/o (all)

The number of blocks in the anti-relay logs show there have been many attempts, it will (for now) signal all files that are 4 blocks, or more, in size. For July, the result is:

PTSMTP_ANTIRELAY.LOG-2010-07-02 is 4 blocks: check file ANTIRELAY.-2010-07-02
PTSMTP_ANTIRELAY.LOG-2010-07-03 is 513 blocks: check file ANTIRELAY.-2010-07-03
PTSMTP_ANTIRELAY.LOG-2010-07-05 is 172 blocks: check file ANTIRELAY.-2010-07-05
PTSMTP_ANTIRELAY.LOG-2010-07-06 is 190 blocks: check file ANTIRELAY.-2010-07-06
PTSMTP_ANTIRELAY.LOG-2010-07-08 is 176 blocks: check file ANTIRELAY.-2010-07-08
PTSMTP_ANTIRELAY.LOG-2010-07-09 is 4 blocks: check file ANTIRELAY.-2010-07-09
PTSMTP_ANTIRELAY.LOG-2010-07-14 is 90 blocks: check file ANTIRELAY.-2010-07-14
PTSMTP_ANTIRELAY.LOG-2010-07-19 is 196 blocks: check file ANTIRELAY.-2010-07-19
PTSMTP_ANTIRELAY.LOG-2010-07-23 is 5 blocks: check file ANTIRELAY.-2010-07-23

That will be done with another script, one day. But I’ve already analyzed the bigger ones before.
The PMAS logs however were not archived – not in the right location. Checking the command procedure, this is obvious: the line would never execute…

The operator logs have been archived – except for the last one, but that might have been in process at the time, soi the expected file didn’t yet exist (operator.log is renamed when processed). Since it’s hard to predict, the job will now start 30 minutes after midnight (I could of course synchronize with the log scanner – but that requires a lot of extra work – where this works as fine)
Saving the WASD logfiles processed the wrong files: the ones of June are archived in the July archive.
Corrected the scripts and re-run it, just a few other ones to take care about, but now it does the job.
Cleanup afterwards now works as well.
Wait for the next run to complete: on 01-Sep-2010.

Syslogd not started
Because I needed to find something out, I screened the current SYSLOGD log file – and found the latet entry has been just before reboot on 17-Jul-2010. Watching the system, there wasn’t a syslod process running, and the service was disabled. So I added the line to start the service to the startup script, started the service and sent a message using Logger. Now the daemon runs.

New licenses loaded
Latest licenses – valid through March 2011 – have been loaded. Next, get a new PMAS license befeore the current one expires: 02-Sep-2010.