24-Feb-2015

Cluster completed
One big problem to create the cluster was the existence of the quorum disk, that both Itanium system cannot access directly. Plus, on one of these nodes, I set EXPERCTED_VOTES to 3. Even without specifying the quorum disk, that system waited, and waited… to join the cluster where the other one (with EXPECTED_VOTES set to 1) would happily join.

Probably not the right thing to do – but now all systems have NO quorum disk and expected_votes of 1 (each contributing one vote). Now I got my cleuster after rebooting all systems:

View of Cluster from system ID 21505  node: DIANA                                                              24-FEB-2018 20:28:18
┌───────────────────────┬─────────┐
│        SYSTEMS        │ MEMBERS │
├────────┬──────────────┼─────────┤
│  NODE  │   SOFTWARE   │  STATUS │
├────────┼──────────────┼─────────┤
│ DIANA  │ VMS V8.4     │ MEMBER  │
│ IRIS   │ VMS V8.4     │ MEMBER  │
│ INDRA  │ VMS V8.4     │ MEMBER  │
└────────┴──────────────┴─────────┘

Similar on the other nodes.
Now Daphne (the small Alpha system) and Inge (to be renamed to another (female) deity as all other systems are) need to be updates for the removed quorum disk. Plus I’ll set up a FreeAXP process on the consoie laptop to function as quorum node.

Startup procedures for IRIS are the same as on Diana – except (for now) that TCPIP is started in the main procedure, instead of the batch one. I will put that back where I defined it: The FTP-issue has also been solved: because SYLOGIN.COM was not accessible (owned by [SYSTEM] but protection states (W:) – so no access) login fails. Changed that to (W:RE) and it’s OK now…So it will probably as well when started in the batch-procedure – as well as SSH (which didn’t start either)
Should have thought about that; this is the one file that I changed in between….

Startup of INDRA will be copied from IRIS (and changed accordingly).

Fun part of reboot of Diana: it speeds up the blogs ….

Found out after publishing this post the first time: SHUTDOWN of both Itaniums caused loss of quorum on Diana, so I had to restart one of the Itaniums, invoke SHUTDOWN1 to add option “REMOVE_NODE”. It stops the server, but allows Diana to continue – since now quorum is adjusted. This should be added to the SHUTDOWN command on all systems – except Diana – to allow that system to continue, whatever happens to the other servers. Or adjust votes for the Itanium boxes to 0..
Or both.

22-Feb-2018

Re-installation – a second time
Made a mistake in re-installation: set the system up as a cluster member, added the wrong data (cluster password) so Iris didn’t start – hung on joining the cluster – obviously. So I re-installed VMS again from scratch (INIT), now without clustering to begin with, and configured TCPIP. Now the problems with FTP were gone.
Next, I copied the saved general directory that contains all startup files (amongst other things) and the system-specific files that call these local procedures, being sure I covered all that is installed (and bypassing everything that isn’t yet) and rebooted. It didn’t work- because the queue manager needed to be defined and started, and queues defined. Once that was done, reboot went fine – as expected – except that, once again, FTP was said to be started in the log, but the server process (TCPIP$FTP_1) was still missing. So I moved startup of TCPIP from the local procedure (started on queue SYS$STARTUP) to SYSTARTUP_VMS.COM (which is started by the STARTUP process), and now FTP (and other services that failed to start) now do run.

But in the process I encountered something weird.
Previously, on startup, the EFI startup procedure would add CPU’s 1,2 and 3 to join the pool. CPU #0 is the startup-processor (Monarch). Now, it’s just CPU #1 that is added. is that the second code on the first CPU – or the first on the second CPU? EFI shell cpuconfig shows me 2 CPU’s running and active – and when trying to enable hyperthreading, the system responds that the CPU’s don’t support it.. So one CPU must be down… Something to dive into…

Anyway: I can now access the system using FTP so I can move all software that is needed, onto the system. and re-install compilers and development environments

NASTY!!!! MariaDB keeps causing problems…Lost connection again…

19-Feb-2018

TCPIP trouble on IRIS
Starting TCPIP-services like FTP and SMTP keep failing on IRIS. The log file states that the procedure for the captive account could not be accessed sue to lack of privilege. There is no apparent for this, the user-entry in SYSUAF is OK, as well as the protection of the file including the full path. I compared it to the data on Diana, it’s the same except for a few system files on disk: These are owned by [1,1] (aka [SYSTEM] on Diana but by [1,1] on IRIS – which may cause a problem? I set all files to match the ownership as on Diana, but that won’t work for INDEXF.SYS since the file is locked as soon as the disk is mounted…
So the only option left is re-install OpenVMS on an INITIALIZEd disk – meaning I will have to redo all installations again. To prevent redoing all work, I made an image backup of the system disk first, so I can restore the files I changed in stead of re-do the edits.

17-Feb-2018

IRIS set up
I want to configure Iris (the big Itanium server) the same way as Diana – the Alpha system. Apart from the obvious differences because of architecture, and a minor change in naming the centralized location (to be moved to a shared disk once I get my external storage (anticipated end of May)) it should be the same.
So I copied the startup-environment, completely, adapted SYSTARTUP_VMS.COM to do just the basic stuff, and run the rest on batch. Of course the files have been changed to fit what’s currently available. However, though submitted to the queue, it never started. Found out that SYSUAF and RIGHTSLIST – both authorization files – were the wrong ones: on Diana, these are kept on the shared environment, and the definitions in SYLOGICALS.COM refer to that environment; and that caused problems starting the procedure – and a lot more problems – it’s renamed on IRIS and the drive the account resides on is false – in these SYSUAF…
After I changed all references to that location (renamed on IRIS, but the files should not be used here anyway!) restart did start the batch procedure. But still not everything runs fine: FTP for instance will not start, there is something wrong with access to the login file. Same for SMTP, NTP and BIND…
Removed the services, and the users and identifiers, will probably have to re-install TCPIP – completely…
Anyway: Most files are now available, eventually I could pass then via Diana (since IRIS boots into the cluster, the shared-SCSI disks are accessible via MSCP – even the bootdrive of Diana.

To be continues 🙂

10-Feb-2018

Update – and a mistake
Updated WordPress this morning (4.9.4) and set up the logicals for the blogs – and made a mistake I only found out tonight. Solved, they now run.