12-Jun-2009

Access problems
Quite severe, to be honest.
Too busy on the job to find out, but yesterday it became emonent that access toall of the webs was troublesome; In fact, both the operator web and webmail were inaccessable, and even the basic homepage was slow. Retrieving mail using the POP protocol fails after some attempts (keeps waiting, and waiting…). I tried to get the webmail index, it took ages to load, at least: part of it; the page was incomplete in the end.

Diana itself showed no real problems looking at the system itself, but when examining the operator log, it became clear that there is a problem: the logfile is filled with messages like:
%%%%%%%%%%%  OPCOM  12-JUN-2009 18:59:03.49  %%%%%%%%%%%
Message from user SYSTEM on DIANA
Event: Frame Check Error from: Node LOCAL:.DIANA CSMA-CD Station CSMACD-0,
        at: 2009-06-12-19:59:03.490+02:00Iinf
        eventUid   18BA7200-5783-11DE-8929-0000F87653E2
        entityUid  EB004904-56DC-11DE-8293-AA0004000154
        streamUid  EE034380-56DC-11DE-8322-AA0004000154

These messages indicate a network problem – a severe one. Mostly related to hardware, like the network card, cable, switch port, or the swittch itself. Or some other machine on the network, pumping badly shaped data over the wire.

Access that doesn’t require the network hardware has no trouble at all. But all that goes ‘outside’ and is more than a few bytes in size, won’t get out. Or incomplete – triggering even more messages to appear. As it turned out, all HTTPS access was virtually blocked, as well as any access to larger files (photos!) from the public site would stall.

But what caused it?
It could be the NIC in the system – it’s over 10 years old so a breakdown could well be possible. But a second, more likely culprit, is the switch that links all systems together. Irene – the system in the living room – often marked a drop in network connctivity, that was restored shortly afterwards. Aphrodite – on which I use to listen to Internet Radio while working at the datacenter – does note drops as well. A collegue told me today that this behaviour often indicates the death of a router. And since this is a very cheap one, it woudln’t be a big surprise if NIC and router run out of sync. It could even happen to a single port.

Time to find out.
I do have a free port on the switch to I reconnected Diana to it. And behold: speed is as in the old days. Apart from the fact the connection is now set to half duplex, it works. There still are messages on frame check errors, only when a lot of data is sent at once: like images and large, encrypted pages like Soymail’s index. But at least, I’ll be able to download my mail 😉

Time to get a new router.

New kid on the block
Demeter, the company laptop, has reached it’s moment of retirement. That is: I have purchased it from the company, and got a new one in stead that I’ll use for company work and other usefull things, like studying Linux, for instance.
By request, the system should be called after it’s administrative name: VXLT090409 – I kept that in the description for the ease of system management at the office. I named the system Gudrun, a name from the Saxon mythology (if I’m well informed). Its operating system is Windows Vista Business – and I already have had the oppportunity to question what the hack it’s doing on disk…It’s still in the phase where I’m installing bits and pieces I miss on the system. Quite some of them…

Irene down again
Another problem occurred today: Irene, the system downstairs, started in the set-up screen, and fails to boot – without a warning. I’ll still have to define what’s wrong there, but it certainly doesn’t look good. Well, a replacement is now available: Demeter…..

06-Jun-2009

Maintenance (Revisited)
After I wrote the last entry, I have installed the VMS updates that I had downloaded before. Of course, after backing up the system disk.
Rebooted once – and, as expected, all processes restarted. Like a charm.

Second, I have changed some system parameters: The number of processes has been cut down to about half – the maximal number of processes running on the system is about 60, so keeping over 200 slots is, well, slightly overdone :). I still have to check the T4 statistics, but the impression is it did help.

Still to be done: clean-up and re-organization of the system disk. Backup took a LONG time, there are quite some files that are really big: the web backups, for instance, take a lot of space – and time. These files can be very badly fragmented : I got one that was split into over 7000 fragments….

The webs – kept on my workstation – have been completely written to DVD as an archive. So a real backup of the PUBLIC web is not relly needed: most files are rather static.