03-Apr-2012

Disk full

Before I was able to check the results of the montly maintenance job yesterday – done remotely this time – I found out the system didn’t respond at all. I could acess the router and that showed me a number of connections on the different ports, but hardly any traffic. I could start a VPN session to the home network, but all I could do was ping. No other access was possible. Would it again be a problem with the quorum disk, that I had some weeks ago – no way to tell….
This evening, it turned out there was nothing worng with the quorum disk, but the DecWindows sessions that I have opened didn’t show up after I switched on the monitor. I would expect the unlock panel top show up, but it was just the background of the window and the system was not responsive at all.
After having started my other node in the cluster, it happily joined the cluster, and there seemed nothing wrong with the disks. Bot any of them. Now I tried to mount the system disk of the main system – but that failed. Not any disk could be mounted….So the systen ran, but was unable to react.
Not even CTRL-P on the console…So the only option was to use the reset button.
Next, I rebooted. The process continues as usual – until, in the end, the main part is started in batch.
Now the real issue was obvious: the queue manager couldn’t start because free space was exhausted.
The procedure ended normally, but again, the system did not respond to teh keyboard, but now I knew what caused the problem.
So I started the system in MIN mode, but once mre, I couldn’t enter the system because it didn’t respond to the keyboard….
The last resort: Start from CD, choose option 8 to do some DCL, Mounted the system disk to find out what caused it. It must have been files created since yesterday after 18:00 system time, so
$ DIR/SIN=yes/SIZE/unit=byte DKB100:[...]
woud show what caused it.
BINGO. Almost immediately.

It turned out to be the backup of the public webs: that contains a load of photographs, the backup is now over 10Gb in size. Pushing that onto e 33Gb disk – well, you can expect trouble.
Removed these files end rebooted – back to normal – solved the problem
Now I’ll have to find a way to backup these files. But I doubt Í really need to, since all files are copied to DVD anyway.
Back to what I would have done yesterday
Maintenance
First of all: mail statistics:
PMAS statistics for March
Total messages    :   7950 = 100.0 o/o
DNS Blacklisted   :   1191 =  14.9 o/o (Files: 31)
Relay attempts    :     28 =    .3 o/o (Files: 16)
Accepted by PMAS  :   6731 =  84.6 o/o (Files: 31)
 Handled by explicit rule
        Rejected :   5899 =  87.6 o/o (processed),  74.2 o/o (all)
        Accepted :    287 =   4.2 o/o (processed),   3.6 o/o (all)
 Handled by content
       Discarded :     86 =   1.2 o/o (processed),   1.0 o/o (all)
    Quarantained :    415 =   6.1 o/o (processed),   5.2 o/o (all)
       Delivered :     44 =    .6 o/o (processed),    .5 o/o (all)
Quite a number of blacklisted domains (15% of all), these don’t pass the first phase of filtering. But the next large group is the number of rejected messages based on their content: almost 3/4 of all. These are the messages seem the ones that clutter operator.log, because they seem to pass the filter for some reason – where I would expect them to be hidden – something to ask Hunter about. Less tham 1 percent is Ok…. And on relay-attempts: there is only one file exceeding 4 blocks; A bit (it’s just 6 blocks in size): I guess March 19th was the big day in relay attempts.
Cleanup and archiving show no problems at all.