25-Dec-2006

Installing a new server

prooved to be more challenging than expected.

Yesterday, I put the AlphaStation400 in place of the two AlphaStation 200’s and booted from it’s local system disk. It did form a cluster with Diana when booted from it’s local disk when it was placed near the workbench, but this time it kept staying a cluster on it’s own. That is: the machine does request to form a cluster with Diana, but that machine does not respond, or the response isn’t received or handled:
$
%%%%%%%%%%% OPCOM 24-DEC-2006 20:02:28.51 %%%%%%%%%%%
20:02:28.50 Node DIANA (csid 00010001) received VMScluster membership request from node DIDO

$
%%%%%%%%%%% OPCOM 24-DEC-2006 20:02:28.51 %%%%%%%%%%%
20:02:28.51 Node DIANA (csid 00010001) proposed addition of node DIDO

$
%%%%%%%%%%% OPCOM 24-DEC-2006 20:02:28.51 %%%%%%%%%%%
20:02:28.51 Node DIANA (csid 00010001) completed VMScluster state transition

It might have been the hub in between causing trouble, but even when connected directly to the switch. I had removed the two single-ended SCSI cards (NEC8100 – Symbiont brand) and the DE450 NIC, and put a KZPBA-CY differential SCSI card (required to access the shared SCSI bus) and a DE500 NIC because that would allow 100Mb Full Duplex. The latter set to just 10Mb – which is the limit for the hub, did not help, even half duplex was no solution. The problem continued…

But it did boot, though it could not access the disks on the shared SCSI without causing havoc. Quite obvious – since to be able to, I had to do some work on the basic configuration first: allow access to the shared SCSI bus by adding it’s PKB-slot in SYS$SYSTEM:SYS$CONFIG.DAT to have the same allocation class as Diana (116) and in the system configuration, have the DEVICE_NAMING parameter set to 1. After that, I could indeed access the disks over the shared SCSI and copy whatever I needed – causing Diana to spit out messages like:

%%%%%%%%%%% OPCOM 25-DEC-2006 21:26:03.96 %%%%%%%%%%%
Device $116$DKA100: (DIANA PKB, DIDO) is offline.
Mount verification is in progress.

%%%%%%%%%%% OPCOM 25-DEC-2006 21:26:03.97 %%%%%%%%%%%
Mount verification has completed for device $116$DKA100: (DIANA PKB, DIDO)

and these occur on the new server as well.

At some point, it got even worse: the volume on DKA100 (it’s system disk) is said to have the wrong volume located, stated on the terminal screen (and not(!) in operator log) causing the session, and the machine, to hang. Even stopping AS400 didn’t help anymore at that point. 

It required Diana to be rebooted – or reset, because acces to the system was now impossible. It happend one or twice….

But once AS400/DIDO was properly setup, and the addition of DIDO as a clustermember re-initiated on Diana, the procedure on Diana would ask for DIDO to be booted. But this time, there is a VERY SEVERE problem:%APB-I-FILENOTLOC, Unable to locate SYSBOOT.EXE
%APB-I-LOADFAIL, Failed to load secondary bootstrap, status = 00000910

halted CPU 0

halt code = 5
HALT instruction executed
PC = 20003d94
warning -- HWRPB is invalid.

Though I doubt it would be the problem, I reckon an update of the SRM console, software (currently 6.9) is no big deal. So I got the lates available (7.0), as wel as the latest for the AlphaServer 1000 (at least, expected it to retrieve, but the site or the file is no longer available) and the AlphaServer 2100 (to be entering the datacentre on Thursday)

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.