BEWARE: This is a technical issue on how to analyze a program error with (built-in!!) options to compiler and linker. To understand, you’ll need some knowledge of hoe a program on VMS looks like internally.
Listing fragments have been slightly edited to fit on the screen.
The issue itself – and what’s related – can be found on the WASD mailing list for 2018 in entry 8 and following. This file contains the data of the analysis as taken from the files (not edited).
As mentioned, HyperSpi++ collector didn’t collect any data since last reboot. The log showed whyL
%SYSTEM-F-HPARITH, high performance arithmetic trap,
Imask=00000000, Fmask=00080000, summary=02, PC=0000000000031448, PS=0000001B
-SYSTEM-F-FLTINV, floating invalid operation,
Improperly handled condition, image exit forced.
Signal arguments: Number = 0000000000000006
Name = 0000000000000504
R0 =0000000000040990 R1 =00B2816B42BD488F R2 =0000000000010250
R3 =0000000000000001 R4 =00000000000201F0 R5 =0000000000040860
R6 =0000000000040870 R7 =0000000000040870 R8 =0000000000000005
R9 =0000000000040907 R10=00000000000408F5 R11=000000007FFCDC18
R12=000000007FFCDA98 R13=000000007AEC33A0 R14=0000000000000000
R15=000000007AEC29B0 R16=0000000000000000 R17=0000000000010848
R18=000000007BF260F8 R19=FFFFFFFFFFFFFFF9 R20=FF4D7E94BD42B771
R21=000000007B63DA10 R22=000000007ADB9940 R23=0000000000000002
R24=0000000000000001 R25=000000000006C004 R26=FFFFFFFF80FAFB60
R27=000000007BEF4820 R28=FFFFFFFF80CAFE40 R29=000000007ADB9930
SP =000000007ADB9860 PC =0000000000031448 PS =200000000000001B
but since the program wasn’t built with the options: /LIS/MACHINE_CODE on compilation, and /MAP/FULL on link. you’re in the dark. IMHO, these options should be mandantory for programs that run without user intervention, as detached program or as a service.
But the kit contains the command procedures to build the agent, so I added these options, built the agent and ran it – interactively. Same error (of course) but now I could pinpoint the location.
The MAP files shows the sections (PSECT) of a program – and the offsets in the image file. The important part is pretty well in the beginning:
0 5 00010000 3 0 READ WRITE NON-SHAREABLE ADDRESS DATA
0 2 00020000 8 0 READ WRITE COPY ON REF
0 30 00030000 10 0 READ ONLY EXECUTABLE < <<<-------
0 5 00040000 0 0 READ WRITE DEMAND ZERO
0 2 00050000 40 0 READ WRITE FIXUP VECTORS
253 20 7FFF0000 0 0 READ WRITE DEMAND ZERO
Most important here is the third line, where the last colums states "EXECUTABLE" This is where executable code - so what the program does - is located. It starts at offset 30000.
Since the crash line states the process counter (PC) where the error happens:
%SYSTEM-F-HPARITH, high performance arithmetic trap, Imask=00000000, Fmask=00080000, summary=02, PC=0000000000031448, PS=0000001B
-SYSTEM-F-FLTINV, floating invalid operation, PC=0000000000031448, PS=0000001B
it means the program counter refers to offset 31448 – within that section (obvious: it happens when the program executes an instruction) at offset 1448 in that section (31448 – 30000).
But what instruction – and which codeline?
Now the listing is important – and especially the /MACHONE_CODE switch. It lists the actual instructions that the program executes – located at their offsets in the executable section of the program. In the listing, this is the second part; it shows instructions, offset (within the section), the instruction and the codeline: For this particular case, look around offset 1448:
1420 LDAH R25, 7(R31) ; 040554
1424 LDL R16, (R4) ; 040553
1428 LDQ R17, -528(R2) ; 040554
142C LDA R25, -16380(R25)
1430 LDQ R18, -552(R2)
1434 LDQ R26, -544(R2)
1438 LDF F1, (R0) ; 040552
143C LDQ R27, -536(R2) ; 040554
1440 LDA R17, 936(R17)
1444 ADDF F1, F18, F19 ; 040552
1448 STF F19, (R0)
1450 BEQ R16, L$122 ; 040553
1454 LDL R16, DECC$GA_STDOUT ; R16, (R18) ; 040554
1458 JSR R26, DECC$GXFPRINTF ; R26, R26
145C L$122: ; 040555
The first column shows the offset in the section – starting at 30000, which is offset 0. So the offending instruction seems to be on offset 1448 – the STF instruction.
The last column – not on all lines – is the line number of the code that the programmer wrote. This will, in this case, be the instruction on line 40552: on the instruction just above this instruction. This is in the first part of the file, so look around this linenummber:
1 40538 /**************************/
1 40539 /* calculate elapsed time */
1 40540 /**************************/
1 40542 lib$sub_times (&CurrentBinTime,
1X 40544 #ifdef __RMIDEF_LOADED
1X 40545 status = lib$cvts_from_internal_time
1X 40546 &DiffBinTime);
1X 40547 #else
1 40548 status = lib$cvtf_from_internal_time
1 40549 &DiffBinTime);
1 40550 #endif
1 40552 TotalDeltaSeconds += DeltaSeconds; < <===HERE
1 40553 if (Debug)
1 40554 fprintf (stdout,
"DeltaSeconds: %f TotalDeltaSeconds: %f\n",
1 40555 DeltaSeconds, TotalDeltaSeconds);
Both TotalDeltaSeconds and DeltaSeconds are defined as FLOAT - which is consistent with the error text. The question is: WHAT IS WRONG? So you will have to do some digging; compile the code with option /DEBUG, and set a breakpoint on, or just above this line, to find out what is going on.
In this case, it turned out that quite likely compiler switch /OPTIMIMZE (perhaps implied by /NODEBUG) made the error occur; using /NOOPTIMZE solved the problem and the program executed flawlessly.