09-Feb-2018

Setting up IRIS
IRIS is the big Itanium 2620 – 2 2-core processors, Hyperthreading enabled so OpenVMS encounters 8 CPU’s; 24Gb memory (the max possible) and 3 143Gb disks.
The node has been added to the cluster:


+-----------------------+---------+
|        SYSTEMS        | MEMBERS |
+--------+--------------+---------+
|  NODE  |   SOFTWARE   |  STATUS |
+--------+--------------+---------+
| IRIS   | VMS V8.4     | MEMBER  |
| DIANA  | VMS V8.4     | MEMBER  |
+-----------------------+---------+

and all languages (BASIC, C, C++ COBOL, FORTRAN and PACAL) and DECSet have been installed. Welcome.txt set up to show it’s name on login;
Next, I will set up the startup-procedure similar to what I have set up 10 years ago: What can be done in batch will be done in background so the basis system will be available as soon as possible.

When that is done, I will setup services – all of them, so the machine can be used as replacement for Diana. Except for central services like DHCP, DNS and NTP, at least, for now. It means I’ll have to install all web-software – that’s why I needed the compilers – WASD, PHP. MariaDB…and test it. The only thing not yet available is PMAS for Itanium – I’ve asked for it.

Another thing to do is using Process’s Multinet in stead of HP’s TCPIP suite – Multinet is superior (that’s why VSI leased that software as replacement).

04-Feb-2018

Hyperspi++

BEWARE: This is a technical issue on how to analyze a program error with (built-in!!) options to compiler and linker. To understand, you’ll need some knowledge of hoe a program on VMS looks like internally.
Listing fragments have been slightly edited to fit on the screen.

The issue itself – and what’s related – can be found on the WASD mailing list for 2018 in entry 8 and following. This file contains the data of the analysis as taken from the files (not edited).

As mentioned, HyperSpi++ collector didn’t collect any data since last reboot. The log showed whyL

%SYSTEM-F-HPARITH, high performance arithmetic trap,
 Imask=00000000, Fmask=00080000, summary=02, PC=0000000000031448, PS=0000001B
-SYSTEM-F-FLTINV, floating invalid operation,
 PC=0000000000031448, PS=0000001B

  Improperly handled condition, image exit forced.
    Signal arguments:   Number = 0000000000000006
                        Name   = 0000000000000504
                                 0008000000000000
                                 0000000000080000
                                 0000000000000002
                                 0000000000031448
                                 000000000000001B

Register dump:
R0 =0000000000040990 R1 =00B2816B42BD488F R2 =0000000000010250
R3 =0000000000000001 R4 =00000000000201F0 R5 =0000000000040860
R6 =0000000000040870 R7 =0000000000040870 R8 =0000000000000005
R9 =0000000000040907 R10=00000000000408F5 R11=000000007FFCDC18
R12=000000007FFCDA98 R13=000000007AEC33A0 R14=0000000000000000
R15=000000007AEC29B0 R16=0000000000000000 R17=0000000000010848
R18=000000007BF260F8 R19=FFFFFFFFFFFFFFF9 R20=FF4D7E94BD42B771
R21=000000007B63DA10 R22=000000007ADB9940 R23=0000000000000002
R24=0000000000000001 R25=000000000006C004 R26=FFFFFFFF80FAFB60
R27=000000007BEF4820 R28=FFFFFFFF80CAFE40 R29=000000007ADB9930
SP =000000007ADB9860 PC =0000000000031448 PS =200000000000001B

but since the program wasn’t built with the options: /LIS/MACHINE_CODE on compilation, and /MAP/FULL on link. you’re in the dark. IMHO, these options should be mandantory for programs that run without user intervention, as detached program or as a service.

But the kit contains the command procedures to build the agent, so I added these options, built the agent and ran it – interactively. Same error (of course) but now I could pinpoint the location.

The MAP files shows the sections (PSECT) of a program – and the offsets in the image file. The important part is pretty well in the beginning:

DEFAULT_CLUSTER
  0     5    00010000    3 0 READ WRITE NON-SHAREABLE ADDRESS DATA
  0     2    00020000    8 0 READ WRITE COPY ON REF
  0    30    00030000   10 0 READ ONLY  EXECUTABLE    < <<<-------   0     5    00040000    0 0 READ WRITE DEMAND ZERO   0     2    00050000   40 0 READ WRITE FIXUP VECTORS 253    20    7FFF0000    0 0 READ WRITE DEMAND ZERO

Most important here is the third line, where the last colums states "EXECUTABLE" This is where executable code - so what the program does - is located. It starts at offset 30000.

Since the crash line states the process counter (PC) where the error happens:

%SYSTEM-F-HPARITH, high performance arithmetic trap, Imask=00000000, Fmask=00080000, summary=02, PC=0000000000031448, PS=0000001B
-SYSTEM-F-FLTINV, floating invalid operation, PC=0000000000031448, PS=0000001B

it means the program counter refers to offset 31448 – within that section (obvious: it happens when the program executes an instruction) at offset 1448 in that section (31448 – 30000).

But what instruction – and which codeline?
Now the listing is important – and especially the /MACHONE_CODE switch. It lists the actual instructions that the program executes – located at their offsets in the executable section of the program. In the listing, this is the second part; it shows instructions, offset (within the section), the instruction and the codeline: For this particular case, look around offset 1448:

0234  L$227:
1420          LDAH    R25, 7(R31)                         ; 040554
1424          LDL     R16, (R4)                           ; 040553
1428          LDQ     R17, -528(R2)                       ; 040554
142C          LDA     R25, -16380(R25)
1430          LDQ     R18, -552(R2)
1434          LDQ     R26, -544(R2)
1438          LDF     F1, (R0)                            ; 040552
143C          LDQ     R27, -536(R2)                       ; 040554
1440          LDA     R17, 936(R17)
1444          ADDF    F1, F18, F19                        ; 040552
1448          STF     F19, (R0)
144C          UNOP
1450          BEQ     R16, L$122                          ; 040553
1454          LDL     R16, DECC$GA_STDOUT ; R16, (R18)    ; 040554
1458          JSR     R26, DECC$GXFPRINTF ; R26, R26
145C  L$122:                                              ; 040555

The first column shows the offset in the section – starting at 30000, which is offset 0. So the offending instruction seems to be on offset 1448 – the STF instruction.
The last column – not on all lines – is the line number of the code that the programmer wrote. This will, in this case, be the instruction on line 40552: on the instruction just above this instruction. This is in the first part of the file, so look around this linenummber:

1   40538 /**************************/
1   40539 /* calculate elapsed time */
1   40540 /**************************/
1   40541
1   40542 lib$sub_times (&CurrentBinTime, 
                         &PreviousBinTime,
                         &DiffBinTime);
1   40543
1X  40544 #ifdef __RMIDEF_LOADED
1X  40545 status = lib$cvts_from_internal_time 
                  (&cvtf_mode, &DeltaSeconds,
1X  40546          &DiffBinTime);
1X  40547 #else
1   40548 status = lib$cvtf_from_internal_time
                 (&cvtf_mode, &DeltaSeconds,
1   40549         &DiffBinTime);
1   40550 #endif
1   40551
1   40552 TotalDeltaSeconds += DeltaSeconds;   < <===HERE 1   40553 if (Debug) 1   40554  fprintf (stdout,           "DeltaSeconds: %f TotalDeltaSeconds: %f\n", 1   40555  DeltaSeconds, TotalDeltaSeconds); 1   40556

Both TotalDeltaSeconds and DeltaSeconds are defined as FLOAT - which is consistent with the error text. The question is: WHAT IS WRONG? So you will have to do some digging; compile the code with option /DEBUG, and set a breakpoint on, or just above this line, to find out what is going on.

In this case, it turned out that quite likely compiler switch /OPTIMIMZE (perhaps implied by /NODEBUG) made the error occur; using /NOOPTIMZE solved the problem and the program executed flawlessly.

02-Feb-2018

Server updates
Got enough of all, abeit small, problems, so I decided to retrieve the lasts software from the WASD site: WASD server (11.1.1.), OpenSSL update for WASD (1.0.2n), Alamode (11.0a) and SoyMail 1.8.3; copied them yesterday evening and built them all this morning, adapted the startup file (including addition of a routine to get the latest “Latest New” file to be shown on the main page). Stopped the server, had to do some updates (sometime shift isn’t noted so I get “1” in stead of “!”, or single- in stead of double quote) but in the end it all came together and all started as expected. Well, except Alamode, which is build onto WASD_ROOT:[EXE] but should reside on WASD_ROOT:[AXP-BIN]. Copy solved that problem 🙂
But of course Hyperspy++ doesn’t have any data for today yet, so whether that works is something to look after.

Next step is to mve it all over to Itanium.

01-Feb-2018

New developments
Since the last post there has been a few developments.
First of all, Nxtware licenses are not working, I asked for help but didn’t get an answer yet. Secondly, I had to configure the ILO module of the box – still in it’s original state, I know it was somewhere in the 10.0.0 or 10.1.1 ranges, but you need the 3-way connector to get it changed. A friend has one – as it turned outh the address was indeed in the 10.0.0 range (183) and we changed it. But the machine failed to boot after that – at least, EFI does find the controller but it takes a lot of time to recognize the disks; Starting VMS, the loader is read but as soon as VMS tries to mount the disk, it fails, and next keeps trying to restart the adapter. It means PKA0 is broken, so two of the three disks are inaccessible. The system boots from the other controller, so I may be able to use that – and access. Anyway, to be prepared for complete failure, I bought a third Itanium, similar to the other one (same supplier) but with just on e 143 Gb disk. Not a problem, because I have 2 73 Gb disks left …
This new system needs to be set up some day soon.

One of these servers will become a backup of the DS10 because I suspect the local disk to get bad. It holds the pagefiles and backups, so it is pretty important…

Maintenance
No problems.

PMAS statistics for January
Total messages    :   3639 = 100.0 o/o
DNS Blacklisted   :      0 =    .0 o/o (Files:  0)
Relay attempts    :    407 =  11.1 o/o (Files: 31)
Accepted by PMAS  :   3232 =  88.8 o/o (Files: 31)
  Handled by explicit rule
         Rejected :   2388 =  73.8 o/o (processed),  65.6 o/o (all)
         Accepted :    196 =   6.0 o/o (processed),   5.3 o/o (all)
  Handled by content
        Discarded :    414 =  12.8 o/o (processed),  11.3 o/o (all)
     Quarantained :    216 =   6.6 o/o (processed),   5.9 o/o (all)
        Delivered :     18 =    .5 o/o (processed),    .4 o/o (all)

More relay attempts on 10th and 27th of January:

10-Jan-2018 between 17:19:13 and 17:22:18 from 23.254.144.160 to 1029mandaditos@gmail.com (192 entries)
27-Jan-2017 between 00:01:43 and 00:05:14 from 23.254.215.28 to locotrones1029@gmail.com (187 entries)

Same network, same signature: Bogus account from grootersnet.nl, to the same gmail account. The address is owned by Hostwinds LLC – and these will one again be warned. Probably this is an address I need a daily logscanner for…

There is a problem with the blog performance and perhaps the datanase. I will restart all webservices..

18-Jan-2018

New development environment
having worked with a graphical IDE for the last 4 years (using Delphi on Windows in my daily job) I think it’s time to install something similar on de VMS boxes. I have used the HP(E) supplied NetBeans solution on my PWS – which is way too small to run such a heavy Java application in it’s 256 Mb memory. But it’s free…
Nevertheless, it hasn’t been updated for years and there is a newer kid on the block – Eclipse based, with a Java component on the VMS side. Since the basis is Unix/Linux based, you’ll also need GNV – GNU for VMS: Nxtware-remote, by eCube. I had it all on Diana but it has been years ago that I had it installed – so I re-installed Java and GNV – the latter has been quite a challenge, since though it looked as if the installation succeeded, it didn’t. Even manually removing the whole installation by hand failed.
As it turned out, it might have been that, due to getting the whole thing on the system, there was a link set up that needed to be removed – using GNV programs. That I just deleted….Luckily I had the whole lot on Dahpne, being a cluster member I could copy what I needed: mnt.exe and umnt.exe. Setting up foreign commands to use them:
$ mnt := PSXROOT:[bin]mnt.exe
$ umnt := PSXROOT:[bin]umnt.exe

I could figure out what to do:
$ mnt
DISK$AXP084:[000000]PSX$ROOT.DIR;1 on /DISK$AXP084/VMS$COMMON/GNV

shows mounted file systems and mount points, so
$ umnt /DISK$AXP084/VMS$COMMON/GNV
get rid of the link – and now re-installation completed..

After that, I could install Nxtware-Remote, and start $ @NXTware-Remote_Configure to set it all up. Now it proved needed to changes a number of SYSGEN parameters (and a number of process quota). Done so, and so I needed a reboot.
Because I had done some cleanup since last boot, a few things didn’t go as planned: MaraiDB didn’t start, not did the webserver. This was caused by specifying a logical that I removed (since it was completely obsolete – a residue from an older instance – and disk. After I updated the startup files for the webserver, it did start, but the database still failed. Since this is done on batch, where SUBMIT has /NOLOG parameter, you’re in the dark. After adding a logfile to the SUBMIT command, I found out the reason: Since the server runs under user MYSQL051_SVR, this needs to login – but it has a captive commandfile:

Username: MYSQL051_SRV Owner:
Account: MYSQL051 UIC: [37775,1] ([MYSQL051,MYSQL051_SRV])
CLI: DCL Tables: DCLTABLES
Default: MYSQL051_ROOT:[MYSQL_SERVER]
LGICMD: MYSQL051_ROOT:[VMS.MYSQL]LOGIN_SERVER.COM
Flags: DisCtlY DefCLI LockPwd Restricted DisWelcome DisNewMail DisMail
DisReport DisReconnect

and logical MYSQL051_ROOT has also been removed…After I added the definition to the command file starting the database server, it started.

A hard test of the startup procedures, but now these are corrected, next startup should go fine.

Now it’s waiting for the licences

WordPress update
Starting the blog to create this post, I was notified a new version of WordPress was available (4.9.2) – so I downloaded it (and Aksimet 4.0.2, thatmatches this version) and installed them.

Next advantage of restart of the webserver: Since some process workingset quota have been enlarged, it seems the blog is faster …
Perhaps I can now run a higher PHP version as well.

04-Jan-2018

More on certificate renewal
Comparing genealogy (wichh got the certificate a few days ago) and both webmail and homedesk made it clear that the mapping for the certificates is fine:


[[(service):80]]
map /.well-known/acme-challenge/* \
/wcme/.well-known/acme-challenge/* map=once
script /wcme* /cgi-bin/wcme*
pass * 403

[[(service):443]]
##[[192.168.0.2:443]]
# ...

but authorization wasn’t:

[[service]]
#[NONE]
#/.well-known/* read

[Auth=VMS]
/* read+write

so nothing could be done on port 80…Obvious, since access to both webmail and homedesk require authentication. For genealogy, unauthorized access is an option, so here it states

[NONE]
/* read
/public/* read

and so I changed the settings for webmail and homedesk accordingly, and now the certificates were created and stored in place.
But the browsers still complained about the invalidity of the certificates, even after clearing the bowser and server caches. The only way to get around it was to restart the server (from the web-admin page, or by
$ httpd/do=restart from the commandline – the same thing.

Now wait until April, for the next renewal…

I wrote a small report on all issues on the WASD mailing list.

01-Jan-2018

Cleaning up 2017
The new years starts with some house-keeping of last year. No surprises, mail is still mostly rubbish:

PMAS statistics for December
Total messages    :   3082 = 100.0 o/o
DNS Blacklisted   :      0 =    .0 o/o (Files:  0)
Relay attempts    :    405 =  13.1 o/o (Files: 31)
Accepted by PMAS  :   2677 =  86.8 o/o (Files: 31)
  Handled by explicit rule
         Rejected :   1836 =  68.5 o/o (processed),  59.5 o/o (all)
         Accepted :    171 =   6.3 o/o (processed),   5.5 o/o (all)
  Handled by content
        Discarded :    445 =  16.6 o/o (processed),  14.4 o/o (all)
     Quarantained :    204 =   7.6 o/o (processed),   6.6 o/o (all)
        Delivered :     21 =    .7 o/o (processed),    .6 o/o (all)

Just two days show a larger amount of relay attempts:

  • 14-DEC-2017 03:26:10.51 to 03:29:14.29 (190) (23.254.204.176)
  • 22-DEC-2017 04:20:08.50 to 04:24:03.01 (184) (104.168.134.210)
  • trying to send a mail from a (bogus user)@grootersnet.nl to 1029mandaditos@gmail.com, both addresses are owned by Hostwinds.com. Quite possible that the router should have blocked them, since I’ve blocked addresses from Hostswinds.com there, but it is possible that these addresses are not properly set (both are /17 addresses according analysis, and since the sender address is forged, so could be the sender address…)
    Anyway, I notified hostwinds.com of these attempts – there have been similar attempts in November).

    BTW: A (real user)@grootersnet.nl to any other outside address will fail as well, since PMAS has a rule to reject any grootersnet.nl sender from outside the local domain…

    Next is a manual job: Moving all 2017 data into one location. That’s the next job tonight.

    Failed certification renewal
    There were a few things to do here.
    First, I needed to update WCME, I was still running the first version (1.0.0) and a new one (1.2.0) was already downloaded but never moved to VMS and installed there. So that was the first activity to do. Next, it failed – again – even after replacing the INSTALLed executable. As it turned out, it was a matter of directory access protection. After that, genealogy.grootersnet.nl got a new one, but webmail and homedesk failed again – due too many failed renewals. So I have to wait some time and retry. But since genealogy.grootersnet.nl does have the new certificate, I guess these two will be renewed as well soon.

    11-Dec-2017

    WordPress update
    As stated on my last post, there is a new WordPress version: Just installed it, so now running WP 4.9.1.

    Not related (because I’ve seen this before:

    Connection lost. Saving has been disabled until you’re reconnected. We’re backing up this post in your browser, just in case.

    I know there are some issues with timing, probably, especially in the admin environment (normal user is slow as well, but admin is worse….) Artea for investigation – but without the PHP code it will be hard to locate the problem. (There is no good way to debug the WP code…).
    Changed some parameters in php.ini, see if that helps….

    02-Dec-2017

    All the same
    Once more, no surprises in the monthly maintenance job. The vast majority of mail has been rejected on explicit rules, or by the content:

    PMAS statistics for November
    Total messages    :   4463 = 100.0 o/o
    DNS Blacklisted   :      0 =    .0 o/o (Files:  0)
    Relay attempts    :    622 =  13.9 o/o (Files: 30)
    Accepted by PMAS  :   3841 =  86.0 o/o (Files: 30)
      Handled by explicit rule
             Rejected :   3024 =  78.7 o/o (processed),  67.7 o/o (all)
             Accepted :    187 =   4.8 o/o (processed),   4.1 o/o (all)
      Handled by content
            Discarded :    423 =  11.0 o/o (processed),   9.4 o/o (all)
         Quarantained :    183 =   4.7 o/o (processed),   4.1 o/o (all)
            Delivered :     24 =    .6 o/o (processed),    .5 o/o (all)

    The number of relay attemps have been larger this month, especially on 5th, 16th and 30th – all files contained almost 180 attempts, and I found the attempts originated from two networks, both leading to Hostwinds1 in Tulsa (US), trying a fake grootersnet.nl user sending to 1029mandaditos@gmail.com:

  • 5-NOV-2017 03:23:11.22 – 03:26:33.68 (175 attempts from 104.168.146.254)
  • 16-NOV-2017 20:04:23.00 – 20:12:07.85 (178 attempts from 23.254.164.233)
  • 30-NOV-2017 00:22:01.61 – 00:58:39.57 (187 attempts from 23.254.165.119)
  • Hostwinds.com has been notified of this, and the networks have been excluded for access – in the router – for some time. I’ll keep an eye on these addresses.

    Next WP update pending 🙂