01-Sep-2015

Maintetnance
The regular maintenace job has shown no surprises, except that the number of messages is low – less than 1000:

PMAS statistics for August
Total messages    :    976 = 100.0 o/o
DNS Blacklisted   :      0 =    .0 o/o (Files:  0)
Relay attempts    :    128 =  13.1 o/o (Files: 26)
Accepted by PMAS  :    848 =  86.8 o/o (Files: 29)
  Handled by explicit rule
         Rejected :    308 =  36.3 o/o (processed),  31.5 o/o (all)
         Accepted :    229 =  27.0 o/o (processed),  23.4 o/o (all)
  Handled by content
        Discarded :    160 =  18.8 o/o (processed),  16.3 o/o (all)
     Quarantained :    124 =  14.6 o/o (processed),  12.7 o/o (all)
        Delivered :     27 =   3.1 o/o (processed),   2.7 o/o (all)

System usage shows the changes in memory of last month: from 512 Mb to 2 Gb, back to 512 Mb due to memory issues, up to to 1.5 Gb after the big outage and latest to 2 Gb valiud memory (where all has been replaced. The differences in CPU-usage haven’t changed much, nor has IO, direct (being disk), buffered and network (mainly internet) though there are slightly less buffered IO’s since throttling has been disabled:

Hyperspy data over August
Hyperspy data over August

The system seems to be very stable now, so AUTOGEN can be run shortly…

29-Aug-2015

New memory
New memory has arrived: the full 2 Gb, now stated “Compaq branded”. I installed it all tonight and so far it’s all in working order. The first set will be destroyed, because it is too unreliable containing these bad buffers.
Throttle and Max-execution-time
The problems with WordPress seems twofold.
First, as I expected, WASD’s throttle causes problems. Immediately after Diana was started, I started the blog. It stated showing up but ended halfway the calendar or the links, or gave me a 503 error (too many processes in FIFO) but never showed the text. And there were no other processes accessing the blog except for this site itself:
2015-08-29_21-50-34
First of all, I trued to give the processes a bit more room: all three times as much. That did help a bit, but now it became clear that the maximum execution time (set to 90 seconds) was too low. So I gave the process two minutes to execute, and disabled throttling – for now, until I find a solution to throttle only incoming requests from other sites than my own.
Somethin like:
if (!remote-addr:(my address) throttle=5,0,0,8,00:02:00, 00:05:00
So the same rule as before, but limited to ‘foreign’ addresses, to prevent an overload.
The overall effect seems dramatic, especially in the number of processes: this dropped!
2015-08-29_22-49-50
but it shows that WordPress – at least: most likely – may start more than just a few processes…

27-Aug-2015

Throttle
Most definitely.
It shows when looking at the logs:
%HTTPD-W-NOTICED, 27-AUG-2015 06:24:04, CGI:2107, not a strict CGI response
-NOTICED-I-SERVICE, http://www.grootersnet.nl:80
-NOTICED-I-CLIENT, 68.180.230.158
-NOTICED-I-URI, GET (18 bytes) /sysblog/?m=201201
-NOTICED-I-SCRIPT, /sysblog/index.php sysblog:[000000]index.php (phpwasd:) SYSBLOG:[000000]index.php
-NOTICED-I-CGI, 2553595354454D2D462D41424F52542C2061626F7274 (22 bytes) %SYSTEM-F-ABORT, abort
-NOTICED-I-RXTX, err:0/0 raw:194/0 net:194/0
%HTTPD-W-NOTICED, 27-AUG-2015 06:27:36, CGI:2107, not a strict CGI response
-NOTICED-I-SERVICE, http://www.grootersnet.nl:80
-NOTICED-I-CLIENT, 82.161.236.244
-NOTICED-I-URI, GET (9 bytes) /sysblog/
-NOTICED-I-SCRIPT, /sysblog/index.php sysblog:[000000]index.php (phpwasd:) SYSBLOG:[000000]index.php
-NOTICED-I-CGI, 2553595354454D2D462D41424F52542C2061626F7274 (22 bytes) %SYSTEM-F-ABORT, abort
-NOTICED-I-RXTX, err:0/0 raw:357/0 net:357/0
%HTTPD-W-NOTICED, 27-AUG-2015 06:27:52, CGI:2107, not a strict CGI response
-NOTICED-I-SERVICE, http://www.grootersnet.nl:80
-NOTICED-I-CLIENT, 82.161.236.244
-NOTICED-I-URI, GET (9 bytes) /sysblog/
-NOTICED-I-SCRIPT, /sysblog/index.php sysblog:[000000]index.php (phpwasd:) SYSBLOG:[000000]index.php
-NOTICED-I-CGI, 2553595354454D2D462D41424F52542C2061626F7274 (22 bytes) %SYSTEM-F-ABORT, abort
-NOTICED-I-RXTX, err:0/0 raw:357/0 net:357/0
%HTTPD-W-NOTICED, 27-AUG-2015 06:27:54, CGI:2107, not a strict CGI response
-NOTICED-I-SERVICE, http://www.grootersnet.nl:80
-NOTICED-I-CLIENT, 82.161.236.244
-NOTICED-I-URI, GET (18 bytes) /sysblog/index.php
-NOTICED-I-SCRIPT, /sysblog/index.php sysblog:[000000]index.php (phpwasd:) SYSBLOG:[000000]index.php
-NOTICED-I-CGI, 2553595354454D2D462D41424F52542C2061626F7274 (22 bytes) %SYSTEM-F-ABORT, abort
-NOTICED-I-RXTX, err:0/0 raw:366/0 net:366/0
%HTTPD-W-NOTICED, 27-AUG-2015 06:28:17, CGI:2107, not a strict CGI response
-NOTICED-I-SERVICE, http://www.grootersnet.nl:80
-NOTICED-I-CLIENT, 66.249.67.27
-NOTICED-I-URI, GET (16 bytes) /sysblog/?p=1095
-NOTICED-I-SCRIPT, /sysblog/index.php sysblog:[000000]index.php (phpwasd:) SYSBLOG:[000000]index.php
-NOTICED-I-CGI, 2553595354454D2D462D41424F52542C2061626F7274 (22 bytes) %SYSTEM-F-ABORT, abort
-NOTICED-I-RXTX, err:0/0 raw:357/0 net:357/0

The address 82.161.236.244 is my own. And when looking at the active processes, there is a number of WASD processes that run SYSBLOG, and some of them have the same address. Some of them show up in the list that is throttled.

Settings for the blogs show it can take some time for such a process to continue:

Throttle Report
Throttle Report

and it may be that this causes problems if one or some of the worker processes (there may be several, concurrent or in sequence) are stuck in the FIFO queue and time out. Likely scenario, given the number of entries that got into the FIFO queue; it exceeds any number I’ve seen with the older WordPress version – even with PHP 5.2-13…

This is something that definitely needs more investigation: the combination of throttle and PHP to begin with, but it is very likely caused by WordPress: What to look for are the number of processes are actually involved, what they do and how they interact. IIRC, in earlier investigations I found that WordPress will cause several WASD worker processes to be started, apparently either processing PHP code, or executing the results. If these processes depend on each other, where one waits for another to finish, and the one executing is held in the FIFO queue, of just rejected because the queue is exhausted, there is trouble.
For this, I set up my old Personal WorkStation 600au again as Daphne, after adding memory (to a whopping 512 Mb) and booted it into the cluster. Now it’s setting up the test environment (MySQL is already installed, need the same WASD, PHP and WordPress versions and setup, and data as on the main system) and see how it behaves: so there will also be a need for some software to keep track of all processes in the system.

Since it is within the cluster, I could easily refer to the real stuff as well, but execute it here. Just a matter of the right logicals and WASD mapping..
A difference in load
some days, I check the load of the previous day. Yesterday’s load was quite different than the one a few days back:

Load over26-Aug-2015
Load over26-Aug-2015

I did notice a site from Russia that constantly poked one of the WordPress files and that forced me to restart WASD (silently) on that day about 11:00 UTC, and after that, the load has been fairly even, like this one.
This ‘ user’ may have caused problems by continuously sending these requests, causing the FIFO queue to be constantly exhausted (some requests returned a 503-error – “Server unavailable” – as a result).
So yet another thing toe look at.
On the memory
I contacted the supplier on the memory issues, and he will replace the bad one (well, two of them to have the same batch in that bank). But shortly afterwards, he asked me to send him a picture of one of the chips on the DIMM:
Buffer chip
Buffer chip

Within a few hours, I got the answer:

And there is the problem.
These DIMMS (we purchased these from HP!) are bad. The buffer chips are some that were intermittently defective
causing no apparent reason for failure.
I am sending 2GB of memory. Please do not resell the memory. please destroy it and send me a photo of it crushed or damaged as it should not find its way back into the marketplace

First of all: Shame on HP. David is right: These should be removed from the circuit.
As soon as the new memory has arrived and is installed and has proven to be functioning well, these DIMMs will be destroyed. There is however one problem I have to tackle: VAT. This is a repair but customs won’t buy that. It may be circumvented by using mail, but:

I will send Fedex. I have had people hit up for large fees to with the mail service. I did notate that it was a repair return so you can tell Fedex to check their records.
You will need to reference the original fedex tracking no.

That’s to be done….(luckily I did’t remove the original mail).

24-Aug-2015

Changes in load
Something has changed: where over a day there was a continuous change in CPU usage, it changed to peak every 90 minutes or so; drop significantly and gradually increase to about 80% – an drop again:
2015-08-24_09-29-33.No idea yest where this comes from….
Changing the execution time of the PHP processes didn’t help much. But before I could add something, the blog didn’t show up; that is: it started displaying but broke halfway. So I started the admin pages and found that there were quite a lot of requests that succeeded, others stalled, or simply broke: network connections dropped, processes exited…So there is another possible cause: the throttle; intended to prevent overload, it now bites back: PHPWASD does start a number of worker processes and I already noticed that this version of wWordpress puts quite a load on the resources. So a max of 5 may be too low.
So I’ll increase that.

23-Aug-2015

PHP errors?
Operator log of yesterday shows quite a number of this type of messages:

%%%%%%%%%%% OPCOM 22-AUG-2015 07:43:14.63 %%%%%%%%%%%
Message from user HTTP$SERVER on DIANA
Process WASD:80 reports
%HTTPD-W-NOTICED, CGI:2107, not a strict CGI response
-HTTPD-I-SCRIPT, /sysblog/index.php (sysblog:[000000]index.php) phpwasd:
-HTTPD-I-CGI, 2553595354454D2D462D41424F52542C2061626F7274 (22 bytes) %SYSTEM-F-ABORT, abort

The server log tells the same story:

%HTTPD-W-NOTICED, 22-AUG-2015 07:43:14, CGI:2107, not a strict CGI response
-NOTICED-I-SERVICE, http://www.grootersnet.nl:80
-NOTICED-I-CLIENT, 82.161.236.244
-NOTICED-I-URI, GET (9 bytes) /sysblog/
-NOTICED-I-SCRIPT, /sysblog/index.php sysblog:[000000]index.php (phpwasd:) SYSBLOG:[000000]index.php
-NOTICED-I-CGI, 2553595354454D2D462D41424F52542C2061626F7274 (22 bytes) %SYSTEM-F-ABORT, abort
-NOTICED-I-RXTX, err:0/0 raw:357/0 net:357/0

and does so on earlier days. This happens on a number of files, but mainly on index.php and xmlrpc.php; and not just on my address…

It is not caused by PHPcode. I have set in PHP.INI:
log_errors = On
log_errors_max_len = 1024
ignore_repeated_errors = Off
ignore_repeated_source = Off
error_log = user:[phplog]php_errors.log

and the last entry in the logfile is 31-May-2015 – when I was experimenting with highr PHP version.

If it were reproducible, WATCH would show the reason, But it seems to be at random, I cannot recognize a pattern here. And without crash data, it will not be easy to find the cause.
There may be one thing though: the maximum execution time. Reset to 60 seconds as before, it may be a little too low so it may cause a PHP-worker process to stop before it has reached the end of processing – hence “ABORT” – by the PHP-engine. So I gave it a bit more time (30 seconds); See what happens next. another one may be max_input_time, kept to the default value of 60 seconds, for now.

This is the only issue, as far as I can see. The system is at about 75-80% of internal memory, and without a lot of hard paging (to disk, that is) – there is apparently enough free space to be used. That is reflected in T4 data: Low hard page rate (unless a process is started, especially a PHPWASD process, I guess. But that is to be expected).
One thing to improve performance of PHP would be to install PHPSHR. Well, some time next week :)

22-Aug-2015

New licenses installed
My current licenses expire September 7th, so to be sure I requested a new set. Within a day, John Egolf,  who handles hobbyists license at HP,  sent me the new set. This comes as a command procedure so it is sufficient to copy it to the VMS system and execute it.
So VMS,  layered products and compilers are covered until 18-Sep-2016.
A new license for PMAS has also been requested at Process, Hunter has sent me a new one that will expire two weeks earlier. Installed that one as well.

21-Aug-2015

It helped
Not just restarting the server.
I still had the maximum execution time for the PHP scripts set to 10 minutes, I used that to upgrade this blog when still on 512 Mb. Now I’m at 1.5 Gb, I upgraded the Tracks blog as well, using the same setting; but that database is significantly smaller than this one, so it finished well within this timespan. But I forgot to reset this parameter.
So that has been reset as well.
This may, or may not have contributed to the low performance, but there were quite a lot of worker processes that ran PHPWASD: (which is the 5.2.13 version of PHPWASD.EXE) within a far higher memory demand than I’ve seen before, it is possible.

Anyway, though not lightning fast, it is better than yesterday.

I’ve seen on the Wordprsss site that there is a program (plugin?) that creates static pages from posts; it comes with Apache’s MOD-REWRITE settings, It may be useful, so I’ll give it a try. The challenge will be to translate MOD-REWRITE to mapping in WASD :).

20-Aug-2015

Load settles
The load seems to settle a bit; I ran into 100% memory yesterday evening, and there wasn’t much movenemnt during today:
2015-08-20_17-46-22
It doesn’t mean all is well. SYSBLOG takes ages to load, if it all succeeds! There are quite few WASD worker proce3ssed in the system and many are referred to by the system itself. I’ll have to restart the Webserver, I think…

19-Aug-2015

Same PHP warnings since startup
Six times, since startup. All in admin – some in dashboard, some in admin-ajax.php:

%HTTPD-W-NOTICED, 18-AUG-2015 15:12:54, CGI:2107, not a strict CGI response
-NOTICED-I-SERVICE, http://www.grootersnet.nl:80
...
-NOTICED-I-CGI, 504850205761726E696E673A20204D6F64756C652027646F (288 bytes) PHP Warning: Module 'dom' already loaded in Unknown on line 0.
-NOTICED-I-RXTX, err:0/0 raw:1094/0 net:1094/0

(No Stack overflow, no ACCVIO – because memory is sufficient.)

No idea where this comes from, the weird thing seems ” Unknown” on line 0. So it comes ‘out of the blue’. It’s just a warning, but still: it shouldn’t occur…

For the rest: All is well.

Update ahead
Downloaded WP 4.2.4 yesterday, in preparation to stay up-to-date. Just found out that 4.3 is out; so downloaded that one as well, to be installed ASAP. So there will be a short interruption, likely tonight.

Results from last maintenance job
The results of the last maintenance job were not exactly lost; Info-Zip creates a temporary file and will move it to the destination (by COPY the file and DELETE it afterwards). These temporary files still exist, since COPY didn’t work since the destination files was not available), DELETE didn’t execute. I located these files and examined them, found they were complete, so I renamed them; they still need to be moved to the archives – another maintenance job to do tonight :)
Funny
It is not the fastest or easiest way to add or edit content, but WordPress for Android does work, even on a site you handle yourself. This update proves it 😀

UPDATED
Just updated WordPress to version 4.3, and Akismet to 3.1.3. So I’m up-to-date – for this blog. Trips, Tracks and Travels is done as well, I only have to find a good theme.
This means a lot of old stuff can now be removed from disk.

Also added RSS (using rrs2 (standard available in WP) to both blogs.

18-Aug-2015

Stable – up to now
It looks fine up to nowL the system has been up and running continuously for 48 hours, without a glitch. The ususal peak on Monday just after midnight (processing last week’s logfiles) didn’t trouble it, nor does the occasional higher dem,and on resources – it seems that the last two days, the system has been quite busy between 14:30 and 16:30, acccording paaks in HyperSpy output:

CPU and memory (both physical and virtual)
CPU and memory (both physical and virtual)

Paging and buffered IO
Paging and buffered IO

Direct IO (disk) and network
Direct IO (disk) and network

The peaks also show up in the WASD graph over the last 72 hours (starting when the system was booted – the part before holds no data, of course):

WASD Graph
WASD Graph

Next is to contact the supplier for a replacement DIMM, for the one that seems to be bad.