07-Feb-2016

Maintenance
Too busy to add the result of the monthly maintenance job – there were no issues except (of course) quite a number of relay attempts from China…


PMAS statistics for January
Total messages    :   3263 = 100.0 o/o
DNS Blacklisted   :      0 =    .0 o/o (Files:  0)
Relay attempts    :   1841 =  56.4 o/o (Files: 31)
Accepted by PMAS  :   1422 =  43.5 o/o (Files: 31)
  Handled by explicit rule
         Rejected :    527 =  37.0 o/o (processed),  16.1 o/o (all)
         Accepted :    231 =  16.2 o/o (processed),   7.0 o/o (all)
  Handled by content
        Discarded :    224 =  15.7 o/o (processed),   6.8 o/o (all)
     Quarantained :    179 =  12.5 o/o (processed),   5.4 o/o (all)
        Delivered :    261 =  18.3 o/o (processed),   7.9 o/o (all)

Most relay attempts were on three days:

02-JAN-2016 04:58:48.40 - 05:36:50.75 : 112.90.183.133 (363)
12-JAN-2016 16:56:59.42 - 18:14:59.84 : 112.90.146.241 (720)
14-JAN-2016 19:54:54.38 - 21:10:08.53 : 112.90.237.94 (720)

In addition, there were a few from another address – but far less:
21-JAN-2016 07:23:38.68 - 07:24:23.62 : 189.199.26.223 (11)
so less harmful – I don’t bother too much on that one.

(Would I need to pass these indications to law enforcement? For businesses it is an obligation today)

Update of WordPress failed
I downloaded the latest version (4.4.2) and installed it, but upgrading the blogs failed. I still have to some more investigation, but I found one issue: Where underscores weren’t used in the names up to 4.3.1, but in 4.4.2 (and probably 4.4 as well) there is one directory that contains an underscore in the name (.wp-includes.random_compat). This causes problems with the script to revert translation of multiple dots in a filename to underscores; VMS translates the these to underscore and the script reveres this: it becomes (.wp-includes.random^.compat).
So here I have a problem in the script; If this the only one, it is easy to prevent. I could ask to change this to hyphen instead but I doubt it will even happen (I will aslk but won’t rely o the change).
Another – far more nasty issue, is a stack overflow just before I get the PHP warnings: WATCH spits out the details – I removed the extra data, so this is basically what happens:

|19:20:02.16 ERROR    0434 **** NOTICED    CGI:2107, not a strict CGI response|
|19:20:02.17 ERROR    1121 0002 RESPONSE   DCL:5220 (basic-only) 502(502)

 "Script did not provide an acceptable response."|
|19:20:02.17 DCL      4967 0002 DCL        READ SYS$OUTPUT %X00000001 47 bytes|

%TRACE-F-TRACEBACK, symbolic stack dump follows
  image    module    routine             line      rel PC           abs PC      
%SYSTEM-F-STKOVF, stack overflow, PC=FFFFFFFF8083B42C, PS=0000001B
  Improperly handled condition, image exit forced.
    Signal arguments:   Number = 0000000000000003
                        Name   = 0000000000000554
                                 FFFFFFFF8083B42C
                                 000000000000001B
    Register dump:   
     R0  = 0000000000000207  R1  = 0000000000000008  R2  = 000000007B64D550
     R3  = 000000007B68A010  R4  = 000000007B68A018  R5  = 0000000000000090
     R6  = 0000000000000090  R7  = 0000000000000041  R8  = 0000000000000000
     R9  = 000000007CAF9820  R10 = 0000000000000020  R11 = 0000000002055088
     R12 = 000000007ADBA4A6  R13 = FFFFFFFF81951C10  R14 = 0000000000000001
     R15 = 0000000000000007  R16 = 000000007B68A064  R17 = 000000007B68A8A8
     R18 = 0000000064616F6C  R19 = 0000017964616F6C  R20 = 0000000064616F6C
     R21 = 0000000000000000  R22 = 000000000000017A  R23 = 000000007BF9A660
     R24 = 0000000000010001  R25 = 0000000000000001  R26 = 000000007B68A018
     R27 = 0000000000000001  R28 = FFFFFFFF802E373C  R29 = 000000007ADB94E0
     SP  = 000000007ADB94E0  PC  = FFFFFFFF8083B42C  PS  = 200000000000001B

So I reverted to the version that works – have to find out first what’s going on…

30-Jan-2016

How not to update
Ok. This is not about VMS but Windows. Windows 10. “Professsional” (you’ll understand the quotes after the story).

To start with the history of this machine.
I built it from (at the time) up-class components, installed Windows 7 Professional (of course I paid for it; OEM version as it was a brand new system…).
Some time later, 8.0 came out and I could update freely – with the option to reverse the installation But I was satisfied with it so I removed that ability. Update to 8.1 went smoothly, without a glitch. So was the update to Windows 10; No surprises, everything went fine. However, there are a few ‘ glitches’: Automatic update is enabled by default, not a real problem, normally. With Windows 7 and 8, System restore was enabled and each update created a restore point so you could easily reverse. NAd if updates were to be installed, you would see it in the menu because there was a notion that updates were to be installed on shutdown – and the progress would be shown on the shutdown-screen.
Windows10 changed that. There is NO warning on updates, nor are you informed that updates are being installed. Shutdown just takes longer. Much longer, eventually. It is after you start the machine that you are informed that updates have been installed.
If yhings go wrong, it’s too late.
As I found out in December. There has been a major update to Windows, installed silently the same way. But from that moment on, things had changed: Some things didn’t work at all anymore, including the new browser (Edge – it really is an improvement though not all functionality of Internet Explorer or other browsers in available), the new Search facility (Cortina), and some important system management tools like the notifier (Action center) and numerous other things: opening a second File Explorer window from the shortcut on the menu didn’t react, for instance. Very annoying if that is what you normally use most.
I posted a complaint on a technical Forum at Microsoft, where I learned that were more installation that have the same problem. Suggestions that I found elsewhere to re-install Edge didn’t work either (and on one occasion I learned that this update had indeed caused a lot of trouble) but time by time, some functionality was rstored – or I found a way around it. But some programs and site-accesses rely on Edge so these don’t work either.

Today I contacted Microsoft supprt via chat. Very helpful. It was tried to set things right again which required a reboot. That is when trouble really started.
First, I could no longer login into my normal account. I usually login with a PIN code, but that was now reversed to password login. But entering the right password (and I am absolutely sure I made no mistakes) failed: Either name or password incorrect. I noted that there was no internet connection, required for login (I think) since the account is coupled to my LIVE.COM account. Luckiliy, I also have a local (administrator) account that I could use. Done some investigations: There was no way I could enable internet connectivity – and the chat session tried to connect – which of course failed.
Using a second PC (Still on Windows 7 and to be kept as such – since some games are not compatible with Windows 10 and will not run) I contacted Microsoft support on this issue. Where I found out (using msconfig) that all services but a few had been disabled. The suggestion was either to revert to a previous version (and re-install Windows 10 – and everything I installed after upgrading from Windows 8.1) – or do a full re-install. Neitehr option was acceptable; the other solution was to reboot normally – which worked.

After that, the connection that was stopped on reboot, was revived and I could get on with the chat. It still didn’t work as before. And the problem resides with all current users on the machine – even the local admin account had the same problem.
One thing I could try was to create a new user and see if that would solve the problem. and behold: THAT WORKED.

So is was suggested that moved all files to that new user and work with that account from now on.

It would mean I would have to copy EVERYTHING on the system to that account. Or change file ownership and protections from one user to another. BY HAND.
Or leave it as it is, hoping that Microsoft will come up with a real solution: one that will repair the system where it broke.

There are still a few things to figure out because I think that there is more that went severely wrong in this update: I doubt very much that there has been made a restore point, for instance. And if so, I doubt it is complete. Since I did look for such a point when I found out of the problems but couldn’t find one

Anyway, I disabled automatic updates. Download is OK but I decide if, and when, they are installed, so I can be sure I have a rstore point if I need to go back.

Update
Just looked at the local admin account.
Where it failed to start Edge and Search this afternoon, IT NOW WORKS THERE….(For now? Well, it does.. But why doesn’t it in my normal account? This must be a registry thing….)
The oldest restore point I can find is beginning og this year. All earlier ones seem to be gone. Latest has been today.

02-Jan-2016

New year – new chances
At least, last maintenance job showed something remarkable:

PMAS statistics for December
Total messages    :  12733 = 100.0 o/o
DNS Blacklisted   :      0 =    .0 o/o (Files:  0)
Relay attempts    :  11118 =  87.3 o/o (Files: 31)
Accepted by PMAS  :   1615 =  12.6 o/o (Files: 31)
  Handled by explicit rule
         Rejected :    784 =  48.5 o/o (processed),   6.1 o/o (all)
         Accepted :    215 =  13.3 o/o (processed),   1.6 o/o (all)
  Handled by content
        Discarded :    255 =  15.7 o/o (processed),   2.0 o/o (all)
     Quarantained :    221 =  13.6 o/o (processed),   1.7 o/o (all)
        Delivered :    140 =   8.6 o/o (processed),   1.0 o/o (all)

Almost the same amount of messages as two years ago, but the reason is different: it’s not spam but trying to abuse the server.
Over half of the relay attempts were on two days, from one network. On Dec 1st, a sender at address 124.200.250.20 tried 3960 times to pass a message from a faked grootersne.nl sender to one recipient on vip.163.com; on December 6th, a sender at address 124.200.250.30 tried it again 3400 times. On December 24th, 26th, 289th and 30th, the sender was on address 112.90.183.133 and tried to pass a message to someone (faked, probably) at 163.com again, each day about 720 times. Just two days earlier, on December 22nd, the same has been attempted from address 183.51.255.217, but this stopped after 227 messages.

The rest was on different days, different addresses but far less.

All logs of 2015 have now been moved to the year archive to be stored in a safe place, to be investigated.

I’ll continue checking the system this year. I even may have time to (finally) create the log-analyzer.

Of course, there is content to be finished (Trips, Tracks and Travels), and some updates (WordPress and related) are pending.

03-Dec-2015

Nothing unusual
The maintenancejob didn’t show anything weird. Just that there were two operator logs extra to the number of days, but with two restarts this is to be expected.
Mail has no funny thingsL
PMAS statistics for November
Total messages    :   8431 = 100.0 o/o
DNS Blacklisted   :      0 =    .0 o/o (Files:  0)
Relay attempts    :   6454 =  76.5 o/o (Files: 30)
Accepted by PMAS  :   1977 =  23.4 o/o (Files: 30)
  Handled by explicit rule
         Rejected :    820 =  41.4 o/o (processed),   9.7 o/o (all)
         Accepted :    246 =  12.4 o/o (processed),   2.9 o/o (all)
  Handled by content
        Discarded :    360 =  18.2 o/o (processed),   4.2 o/o (all)
     Quarantained :    509 =  25.7 o/o (processed),   6.0 o/o (all)
        Delivered :     42 =   2.1 o/o (processed),    .4 o/o (all)
except the number of relay attempts: 2500 between 7-NOV-2015 15:58:58.60 and 7-NOV-2015 20:51:36.53, from address 124.200.250.19. attemting toreach some addressee at 163.com

You would expect it a long running job but it isn’t: just 8 minutes elapsed, using less than 1 minute of CPU time:

  SYSTEM       job terminated at  1-DEC-2015 01:08:02.31

  Accounting information:
  Buffered I/O count:               9082      Peak working set size:       3888
  Direct I/O count:                57891      Peak virtual size:         183008
  Page faults:                     13301      Mounted volumes:                0
  Charged CPU time:        0 00:00:45.13      Elapsed time:       0 00:08:02.29

30-Nov-2015

Like a charm
Up and running now for some weeks, and though there is still some work to be done like work on the power lines in the computer5 room (so the systems have to be shutdown once more to switch the new extensions) it already proves to be worth the work: temperatures are more stable and less high, and it will be easier to minimize noise leakage keeping temperatures at this nice level.
Even on the performance and software level there is little to tell. PHP does have it’s hick-ups but less than last months, and access isn’t that much less. So yes – it’s fine as well. There are a few updates to be installed but nothing very troublesome. It’s only that the jetpack installation still doesn’t work – to be honest, I didn’t look into that last months – but I don’t seem to need it anyway :) At least, What DO I actually miss? Hardly anything…

On the content front, I’m still way behind schedule but I managed to finish the Corfu images (finally), still have to mark then on the track imaged, and type out the comments I wrote down each day. If all goes according plan, this is done in a short time. But there are still a few to go: Two Dutch long distance footpaths we walked (Westerbork and Willibrord) and our trip from Vienna to Budapest.

Next Wednesday, I’ll get my VMS knowledge update. Perhaps the latest software updates as well..

10-Nov-2015

More new cables
As extension of keyboard- and mouse cables didn’t work, I ordered normal PS/2 cables, expecting them to work. These arrived today and I installed them. however, nor keyboard nor mouse were functioning – probably since DecWindows couldn’t access them on startup, and so these are left unnoticed.
There must be another way of doing it (probably stopping and starting the DecWindows server) but the quick and dirty route (AKA Reboot) actually does the same. Plus that the last changes in the startup-procedure (the ones I forgot last time) could be tested.
So In rebooted.
It went smooth and – in the quick look – seems to have started all that is needed. Plus that the directly connected keyboard and mouse do work.
As expected.

07-Nov-2015

New cables
Since the systems have been moved to another position, the cabling for monitor, keyboard and mouse weren’t long enough to connect to the server, so I ordered extension cables/ For keyboard and mouse, I envisioned that USB extensions would do, and this one came with build-in amplifiers. But nothing happened, so I decided a reboot – which failed: there were user files open on two disks and that meant these could not be dismounted, and the system became overloaded – for no appetent reason. Tried to stop some processes but the system became slower and slower. CTRL_P, and crash (to find out what was going on) was required. After restart it became clear that there was a TCPIP process that took all CPU and caused excessive paging.
But the system started without a hitch – almost: after upgrading WordPress (to a new directory) I forgot to change the logicals for the blogs. Done that by hand so it now all works again.
There is one more short downtime to be expected, when I change power tomorrow.

The USB extensions don’t work so I have ordered the normal PS2 extension cables. the USB extension have found good use by moving the printer and scanner, and now I have some more room on the workplace to add more machines.

03-Nov-2013

Nothing special
The maintenance log shows nothing particular. it has been rather silent.
PMAS statistics for October
Total messages    :   1233 = 100.0 o/o
DNS Blacklisted   :    108 =   8.7 o/o (Files:  1)
Relay attempts    :    192 =  15.5 o/o (Files: 30)
Accepted by PMAS  :    933 =  75.6 o/o (Files: 31)
  Handled by explicit rule
         Rejected :    386 =  41.3 o/o (processed),  31.3 o/o (all)
         Accepted :    206 =  22.0 o/o (processed),  16.7 o/o (all)
  Handled by content
        Discarded :    162 =  17.3 o/o (processed),  13.1 o/o (all)
     Quarantained :    154 =  16.5 o/o (processed),  12.4 o/o (all)
        Delivered :     25 =   2.6 o/o (processed),   2.0 o/o (all)

Even the number of relay attempts has been pretty low, the largest amount of attempts (100) were on 30-Oct-2015, sent from 14.222.40.34 (outside address) with a faked sender account (any@grootersnet.com, but that can only be sent from this address :))
Just the number of SYSLOGD log files is larger than usual; there have been a number of days where someone tries to retrieve large amounts of data from the site – like the YAHOO-based access a few days ago – but there isn’t a logfile scanner – yet – so I would have to look into that (one of the pending projects is just about that…).
System load has been stable as well over the last four weeks. Except for the reboot due to moving the system:
1
2
3

So nothing special….

30-Aug-2015

A few days ago, there was one address (68.180.230.158) that had many, many links open to the site, since it connected to port 443 and Diana, suspicion arose someone was trying to access one of the secure sites. So it was blocked access to all of them: trying to access the site will just drop the connection.

The server log showed the same address encountered four times an ACCVIO on accessing the entry of this blog:

%SYSTEM-F-ACCVIO, access violation, reason mask=04, virtual address=000000000597C000, PC=FFFFFFFF8083D9A0, PS=0000001B

for times in a row.
This error has occurred a few days earlier as well, but from a different origin, on a different program counter:

%SYSTEM-F-ACCVIO, access violation, reason mask=04, virtual address=000000000597C000, PC=FFFFFFFF8083D9B8, PS=0000001B

The difference in program counter is obvious since this is a separate process that can be located elsewhere in memory, but since the virtual address is the same, it is the same error. ACCVIO will cause the process to abort but when a next process is started immediately afterwards (which is possible in this particular case) it can be loaded in the same physical location and produce the same error. Here, it is the PHP executor, this will cause WASD to start worker processes on the fly – probably in the same location.

Reason mask 04 means the process tries to modify a location that is read-only, or inaccessible. It may also mean that an attempt was made to expand the user stack – as the HELP/MESSAGE output states:

This message is also displayed when an attempt has been made
to make the user stack larger than the user’s virtual address
space permits.

I noted this error before, and rather often, when the system was on 512 Mb of memory, so it is likely to be related to lack of virtual memory – a too low setting of process parameter pgflquo of user HTTP$NOBODY (set to 500.000). So I could increase this value, but since it only occurs on higher loads from one address, this can be deferred.

On older systems, I could look at system parameter VIRTUALPAGECNT but that is obsolete in VMS version 8.4 and set to the max possible (2147483647, the default value):
SYSGEN> SHOW VIRTUALPAGECNT
Parameter Name Current Default Min. Max. Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
VIRTUALPAGECNT 2147483647 2147483647 2048 2147483647 Obsolete
SYSGEN>

26-Oct-2015

A few issues
There was too little time last week to look if everything works properly, so it was only this morning that I found that there are a few issues, but only in the operational site (so far), that needs to be looked at: The list op OPERATOR logs won’t show up (gives me an extract of the access log), for instance, though the link is all right. So there may be more that’s not working as it should. The question is, however, why this happened. There was no change in environment that I’m aware of. So there is some investigation to do.
The result:
Running the procedure that creates the index file, didn’t finish as intended:

$ @SMAN:[com]indexoperlogs.com
Creating index file of operator and FTP logs
%RMS-F-FUL, device full (insufficient space for allocation)

So where has the space gone? Creating a list on this device won’t work either. so put it elsewhere (with enough free space):
$ dir/out=$116$dka106:[000000]x.x/size web_disk2:[000000...]
$

to get a full view on all what’s on the disk (which is actually a logical one).

No matter what, I can remove the old WordPress versions, saving me some 100.000 blocks. the big win however is removing all images of Trips, Tracks and Travels from the disk, because these are moved a few weeks ago to another disk, and it seems to work fine. That means 10 Gb….. Of course, I used a script (Q&D) for this operation.
Why the disk was so full: I added a view videos to Tessa and som eother stuff on a disk that was already pretty filled up. It fitted – until the operatorlog.html file was once more created – a week ago… The logfiles were a bit larger than normal because Daphne was started (and this logs on Diana as well).

And the text on the home page needs to be updated and set s well, this is not done automatically, and I didn’t add this on the last reboot.