09-Feb-2018

Setting up IRIS
IRIS is the big Itanium 2620 – 2 2-core processors, Hyperthreading enabled so OpenVMS encounters 8 CPU’s; 24Gb memory (the max possible) and 3 143Gb disks.
The node has been added to the cluster:


+-----------------------+---------+
|        SYSTEMS        | MEMBERS |
+--------+--------------+---------+
|  NODE  |   SOFTWARE   |  STATUS |
+--------+--------------+---------+
| IRIS   | VMS V8.4     | MEMBER  |
| DIANA  | VMS V8.4     | MEMBER  |
+-----------------------+---------+

and all languages (BASIC, C, C++ COBOL, FORTRAN and PACAL) and DECSet have been installed. Welcome.txt set up to show it’s name on login;
Next, I will set up the startup-procedure similar to what I have set up 10 years ago: What can be done in batch will be done in background so the basis system will be available as soon as possible.

When that is done, I will setup services – all of them, so the machine can be used as replacement for Diana. Except for central services like DHCP, DNS and NTP, at least, for now. It means I’ll have to install all web-software – that’s why I needed the compilers – WASD, PHP. MariaDB…and test it. The only thing not yet available is PMAS for Itanium – I’ve asked for it.

Another thing to do is using Process’s Multinet in stead of HP’s TCPIP suite – Multinet is superior (that’s why VSI leased that software as replacement).

04-Feb-2018

Hyperspi++

BEWARE: This is a technical issue on how to analyze a program error with (built-in!!) options to compiler and linker. To understand, you’ll need some knowledge of hoe a program on VMS looks like internally.
Listing fragments have been slightly edited to fit on the screen.

The issue itself – and what’s related – can be found on the WASD mailing list for 2018 in entry 8 and following. This file contains the data of the analysis as taken from the files (not edited).

As mentioned, HyperSpi++ collector didn’t collect any data since last reboot. The log showed whyL

%SYSTEM-F-HPARITH, high performance arithmetic trap,
 Imask=00000000, Fmask=00080000, summary=02, PC=0000000000031448, PS=0000001B
-SYSTEM-F-FLTINV, floating invalid operation,
 PC=0000000000031448, PS=0000001B

  Improperly handled condition, image exit forced.
    Signal arguments:   Number = 0000000000000006
                        Name   = 0000000000000504
                                 0008000000000000
                                 0000000000080000
                                 0000000000000002
                                 0000000000031448
                                 000000000000001B

Register dump:
R0 =0000000000040990 R1 =00B2816B42BD488F R2 =0000000000010250
R3 =0000000000000001 R4 =00000000000201F0 R5 =0000000000040860
R6 =0000000000040870 R7 =0000000000040870 R8 =0000000000000005
R9 =0000000000040907 R10=00000000000408F5 R11=000000007FFCDC18
R12=000000007FFCDA98 R13=000000007AEC33A0 R14=0000000000000000
R15=000000007AEC29B0 R16=0000000000000000 R17=0000000000010848
R18=000000007BF260F8 R19=FFFFFFFFFFFFFFF9 R20=FF4D7E94BD42B771
R21=000000007B63DA10 R22=000000007ADB9940 R23=0000000000000002
R24=0000000000000001 R25=000000000006C004 R26=FFFFFFFF80FAFB60
R27=000000007BEF4820 R28=FFFFFFFF80CAFE40 R29=000000007ADB9930
SP =000000007ADB9860 PC =0000000000031448 PS =200000000000001B

but since the program wasn’t built with the options: /LIS/MACHINE_CODE on compilation, and /MAP/FULL on link. you’re in the dark. IMHO, these options should be mandantory for programs that run without user intervention, as detached program or as a service.

But the kit contains the command procedures to build the agent, so I added these options, built the agent and ran it – interactively. Same error (of course) but now I could pinpoint the location.

The MAP files shows the sections (PSECT) of a program – and the offsets in the image file. The important part is pretty well in the beginning:

DEFAULT_CLUSTER
  0     5    00010000    3 0 READ WRITE NON-SHAREABLE ADDRESS DATA
  0     2    00020000    8 0 READ WRITE COPY ON REF
  0    30    00030000   10 0 READ ONLY  EXECUTABLE    < <<<-------   0     5    00040000    0 0 READ WRITE DEMAND ZERO   0     2    00050000   40 0 READ WRITE FIXUP VECTORS 253    20    7FFF0000    0 0 READ WRITE DEMAND ZERO

Most important here is the third line, where the last colums states "EXECUTABLE" This is where executable code - so what the program does - is located. It starts at offset 30000.

Since the crash line states the process counter (PC) where the error happens:

%SYSTEM-F-HPARITH, high performance arithmetic trap, Imask=00000000, Fmask=00080000, summary=02, PC=0000000000031448, PS=0000001B
-SYSTEM-F-FLTINV, floating invalid operation, PC=0000000000031448, PS=0000001B

it means the program counter refers to offset 31448 – within that section (obvious: it happens when the program executes an instruction) at offset 1448 in that section (31448 – 30000).

But what instruction – and which codeline?
Now the listing is important – and especially the /MACHONE_CODE switch. It lists the actual instructions that the program executes – located at their offsets in the executable section of the program. In the listing, this is the second part; it shows instructions, offset (within the section), the instruction and the codeline: For this particular case, look around offset 1448:

0234  L$227:
1420          LDAH    R25, 7(R31)                         ; 040554
1424          LDL     R16, (R4)                           ; 040553
1428          LDQ     R17, -528(R2)                       ; 040554
142C          LDA     R25, -16380(R25)
1430          LDQ     R18, -552(R2)
1434          LDQ     R26, -544(R2)
1438          LDF     F1, (R0)                            ; 040552
143C          LDQ     R27, -536(R2)                       ; 040554
1440          LDA     R17, 936(R17)
1444          ADDF    F1, F18, F19                        ; 040552
1448          STF     F19, (R0)
144C          UNOP
1450          BEQ     R16, L$122                          ; 040553
1454          LDL     R16, DECC$GA_STDOUT ; R16, (R18)    ; 040554
1458          JSR     R26, DECC$GXFPRINTF ; R26, R26
145C  L$122:                                              ; 040555

The first column shows the offset in the section – starting at 30000, which is offset 0. So the offending instruction seems to be on offset 1448 – the STF instruction.
The last column – not on all lines – is the line number of the code that the programmer wrote. This will, in this case, be the instruction on line 40552: on the instruction just above this instruction. This is in the first part of the file, so look around this linenummber:

1   40538 /**************************/
1   40539 /* calculate elapsed time */
1   40540 /**************************/
1   40541
1   40542 lib$sub_times (&CurrentBinTime, 
                         &PreviousBinTime,
                         &DiffBinTime);
1   40543
1X  40544 #ifdef __RMIDEF_LOADED
1X  40545 status = lib$cvts_from_internal_time 
                  (&cvtf_mode, &DeltaSeconds,
1X  40546          &DiffBinTime);
1X  40547 #else
1   40548 status = lib$cvtf_from_internal_time
                 (&cvtf_mode, &DeltaSeconds,
1   40549         &DiffBinTime);
1   40550 #endif
1   40551
1   40552 TotalDeltaSeconds += DeltaSeconds;   < <===HERE 1   40553 if (Debug) 1   40554  fprintf (stdout,           "DeltaSeconds: %f TotalDeltaSeconds: %f\n", 1   40555  DeltaSeconds, TotalDeltaSeconds); 1   40556

Both TotalDeltaSeconds and DeltaSeconds are defined as FLOAT - which is consistent with the error text. The question is: WHAT IS WRONG? So you will have to do some digging; compile the code with option /DEBUG, and set a breakpoint on, or just above this line, to find out what is going on.

In this case, it turned out that quite likely compiler switch /OPTIMIMZE (perhaps implied by /NODEBUG) made the error occur; using /NOOPTIMZE solved the problem and the program executed flawlessly.

02-Feb-2018

Server updates
Got enough of all, abeit small, problems, so I decided to retrieve the lasts software from the WASD site: WASD server (11.1.1.), OpenSSL update for WASD (1.0.2n), Alamode (11.0a) and SoyMail 1.8.3; copied them yesterday evening and built them all this morning, adapted the startup file (including addition of a routine to get the latest “Latest New” file to be shown on the main page). Stopped the server, had to do some updates (sometime shift isn’t noted so I get “1” in stead of “!”, or single- in stead of double quote) but in the end it all came together and all started as expected. Well, except Alamode, which is build onto WASD_ROOT:[EXE] but should reside on WASD_ROOT:[AXP-BIN]. Copy solved that problem 🙂
But of course Hyperspy++ doesn’t have any data for today yet, so whether that works is something to look after.

Next step is to mve it all over to Itanium.

01-Feb-2018

New developments
Since the last post there has been a few developments.
First of all, Nxtware licenses are not working, I asked for help but didn’t get an answer yet. Secondly, I had to configure the ILO module of the box – still in it’s original state, I know it was somewhere in the 10.0.0 or 10.1.1 ranges, but you need the 3-way connector to get it changed. A friend has one – as it turned outh the address was indeed in the 10.0.0 range (183) and we changed it. But the machine failed to boot after that – at least, EFI does find the controller but it takes a lot of time to recognize the disks; Starting VMS, the loader is read but as soon as VMS tries to mount the disk, it fails, and next keeps trying to restart the adapter. It means PKA0 is broken, so two of the three disks are inaccessible. The system boots from the other controller, so I may be able to use that – and access. Anyway, to be prepared for complete failure, I bought a third Itanium, similar to the other one (same supplier) but with just on e 143 Gb disk. Not a problem, because I have 2 73 Gb disks left …
This new system needs to be set up some day soon.

One of these servers will become a backup of the DS10 because I suspect the local disk to get bad. It holds the pagefiles and backups, so it is pretty important…

Maintenance
No problems.

PMAS statistics for January
Total messages    :   3639 = 100.0 o/o
DNS Blacklisted   :      0 =    .0 o/o (Files:  0)
Relay attempts    :    407 =  11.1 o/o (Files: 31)
Accepted by PMAS  :   3232 =  88.8 o/o (Files: 31)
  Handled by explicit rule
         Rejected :   2388 =  73.8 o/o (processed),  65.6 o/o (all)
         Accepted :    196 =   6.0 o/o (processed),   5.3 o/o (all)
  Handled by content
        Discarded :    414 =  12.8 o/o (processed),  11.3 o/o (all)
     Quarantained :    216 =   6.6 o/o (processed),   5.9 o/o (all)
        Delivered :     18 =    .5 o/o (processed),    .4 o/o (all)

More relay attempts on 10th and 27th of January:

10-Jan-2018 between 17:19:13 and 17:22:18 from 23.254.144.160 to 1029mandaditos@gmail.com (192 entries)
27-Jan-2017 between 00:01:43 and 00:05:14 from 23.254.215.28 to locotrones1029@gmail.com (187 entries)

Same network, same signature: Bogus account from grootersnet.nl, to the same gmail account. The address is owned by Hostwinds LLC – and these will one again be warned. Probably this is an address I need a daily logscanner for…

There is a problem with the blog performance and perhaps the datanase. I will restart all webservices..