12-Nov-2008

Crash ?

This morning, about 12:00, all webs stopped responding, and any access resulted in a “Server not found” reply in the browser.

being off-site for today and tomorrow, it is not possible to get back soon enough. So I phoned home and found Kasper able to restart Diana and it seems all came up smoothly. Before, on my request, he had switched the console terminal on (to save power, it’s usually switched off) but the screen remained blank; the lights on the keyboard didn’t flash, and moving the mouse made no difference either. 

No error lights on the disks either. Just the power units and the controller – the latter flashing but that’s normal. 

Clearly something caused Diana to hang.  Now it’s a matter of waiting until tonight to find out what happened…

Well, wherever I looked, there is no appearant reason why the system ceased to respond. it must have been some time after 11;52, that’s the last time stamp I could find in any log file. Checking the error log, there I found the latest time stamp; but no clue on an error that might have caused the hang. Nor does operator.log, none of the running services mentions a problem. Checking accounting showed the system has run a bit longer:

     Date / Time      Type     Subtype     Username      ID     Source   Status
--------------------------------------------------------------------------------
12-NOV-2008 11:53:44 PROCESS INTERACTIVE HTTP$NOBODY  20200A40 MBA154: 00002BD4
12-NOV-2008 11:53:47 PROCESS INTERACTIVE HTTP$NOBODY  20200A41 MBA159: 00002BD4
12-NOV-2008 11:56:39 PROCESS INTERACTIVE HTTP$NOBODY  20200A52 MBA164: 00002BD4
12-NOV-2008 11:57:00 PROCESS BATCH       SYSTEM       20200A53         10030001

You could speculate that this batch process caused a problem, but that’s quite unlikely:

BATCH Process Termination
-------------------------
Username:          SYSTEM            UIC:               [SYSTEM]
Account:           SYSTEM            Finish time:       12-NOV-2008 11:57:00.15
Process ID:        20200A53          Start time:        12-NOV-2008 11:57:00.00
Owner ID:                            Elapsed time:                0 00:00:00.14
Terminal name:                       Processor time:              0 00:00:00.10
Remote node addr:                    Priority:          3
Remote node name:                    Privilege <31-00>: FFFFFFFF
Remote ID:                           Privilege <63-32>: FFFFFFFF
Remote full name:
Posix UID:         -2                Posix GID:         -2 (%XFFFFFFFE)
Queue entry:       194               Final status code: 10030001
Queue name:        NORMAL
Job name:          Mysql_watch
Final status text: %CLI-S-NORMAL, normal successful completion
Page faults:              315        Direct IO:                 41
Page fault reads:          47        Buffered IO:               59
Peak working set:        2704        Volumes mounted:            0
Peak page file:        173104        Images executed:            5

This job has run – unaltered – every 15 minutes for months so it’s unlikely it has caused an issue here.
The other jobs were simply FORCED_DELETE signals – the jobs being stopped by the webserver. Very common as well, so not likely to have caused the hang…
Audit has just one record between 11:15 and 14:00: the initialization of the audit server:
12-NOV-2008 13:53:36.67 AUDIT      AUDIT_INITIATE   DIANA  SYSTEM       20200105
So what has been the reason to hang, cannot be found….At least, I’m out of options.
Well, it did startup again, and as it seems, without a problem. That’s important as well.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.