04-Sep-2015

PHP messages
Over the last week – running without throttle, an elevated execution time for PHP scripts and the max of internal memory, PHP seems to work fine; it may be faster (and probably will be) if PHPSHR.EXE is installed, a step to be taken this weekend.
All messages I had in this period:
6 times:

%%%%%%%%%%% OPCOM 1-SEP-2015 09:48:06.60 %%%%%%%%%%%
Message from user HTTP$SERVER on DIANA
Process WASD:80 reports
%HTTPD-W-NOTICED, CGI:1997, not a strict CGI response
-HTTPD-I-SCRIPT, /sysblog/wp-admin/admin-ajax.php (sysblog:[wp-admin]admin-ajax.php) phpwasd:
-HTTPD-I-CGI, 504850205761726E696E673A20204D6F64756C652027646F (2048 bytes) PHP Warning: Module 'dom' already loaded in Unknown on line 0.

(of course on different times, but always in the same module)
and one:

%%%%%%%%%%% OPCOM 3-SEP-2015 13:11:24.71 %%%%%%%%%%%
Message from user HTTP$SERVER on DIANA
Process WASD:80 reports
%HTTPD-W-NOTICED, CGI:2107, not a strict CGI response
-HTTPD-I-SCRIPT, /sysblog/index.php (sysblog:[000000]index.php) phpwasd:
-HTTPD-I-CGI, 2553595354454D2D462D41434356494F2C20616363657373 (118 bytes) %SYSTEM-F-ACCVIO, access violation, reason mask=04, virtual address=000000000592C000, PC=FFFFFFFF8083D9B8, PS=0000001B

Of course the reason may have been that there was no traffic – but that is not the case: WASD logfiles do show quite a number of entries to SYSBLOG and TRACKS- and not just the ones by the worker processes (since these originate from my address, they won’t be shown there, these are filtered out).

01-Sep-2015

Maintetnance
The regular maintenace job has shown no surprises, except that the number of messages is low – less than 1000:

PMAS statistics for August
Total messages    :    976 = 100.0 o/o
DNS Blacklisted   :      0 =    .0 o/o (Files:  0)
Relay attempts    :    128 =  13.1 o/o (Files: 26)
Accepted by PMAS  :    848 =  86.8 o/o (Files: 29)
  Handled by explicit rule
         Rejected :    308 =  36.3 o/o (processed),  31.5 o/o (all)
         Accepted :    229 =  27.0 o/o (processed),  23.4 o/o (all)
  Handled by content
        Discarded :    160 =  18.8 o/o (processed),  16.3 o/o (all)
     Quarantained :    124 =  14.6 o/o (processed),  12.7 o/o (all)
        Delivered :     27 =   3.1 o/o (processed),   2.7 o/o (all)

System usage shows the changes in memory of last month: from 512 Mb to 2 Gb, back to 512 Mb due to memory issues, up to to 1.5 Gb after the big outage and latest to 2 Gb valiud memory (where all has been replaced. The differences in CPU-usage haven’t changed much, nor has IO, direct (being disk), buffered and network (mainly internet) though there are slightly less buffered IO’s since throttling has been disabled:

Hyperspy data over August
Hyperspy data over August

The system seems to be very stable now, so AUTOGEN can be run shortly…

29-Aug-2015

New memory
New memory has arrived: the full 2 Gb, now stated “Compaq branded”. I installed it all tonight and so far it’s all in working order. The first set will be destroyed, because it is too unreliable containing these bad buffers.
Throttle and Max-execution-time
The problems with WordPress seems twofold.
First, as I expected, WASD’s throttle causes problems. Immediately after Diana was started, I started the blog. It stated showing up but ended halfway the calendar or the links, or gave me a 503 error (too many processes in FIFO) but never showed the text. And there were no other processes accessing the blog except for this site itself:
2015-08-29_21-50-34
First of all, I trued to give the processes a bit more room: all three times as much. That did help a bit, but now it became clear that the maximum execution time (set to 90 seconds) was too low. So I gave the process two minutes to execute, and disabled throttling – for now, until I find a solution to throttle only incoming requests from other sites than my own.
Somethin like:
if (!remote-addr:(my address) throttle=5,0,0,8,00:02:00, 00:05:00
So the same rule as before, but limited to ‘foreign’ addresses, to prevent an overload.
The overall effect seems dramatic, especially in the number of processes: this dropped!
2015-08-29_22-49-50
but it shows that WordPress – at least: most likely – may start more than just a few processes…

27-Aug-2015

Throttle
Most definitely.
It shows when looking at the logs:
%HTTPD-W-NOTICED, 27-AUG-2015 06:24:04, CGI:2107, not a strict CGI response
-NOTICED-I-SERVICE, http://www.grootersnet.nl:80
-NOTICED-I-CLIENT, 68.180.230.158
-NOTICED-I-URI, GET (18 bytes) /sysblog/?m=201201
-NOTICED-I-SCRIPT, /sysblog/index.php sysblog:[000000]index.php (phpwasd:) SYSBLOG:[000000]index.php
-NOTICED-I-CGI, 2553595354454D2D462D41424F52542C2061626F7274 (22 bytes) %SYSTEM-F-ABORT, abort
-NOTICED-I-RXTX, err:0/0 raw:194/0 net:194/0
%HTTPD-W-NOTICED, 27-AUG-2015 06:27:36, CGI:2107, not a strict CGI response
-NOTICED-I-SERVICE, http://www.grootersnet.nl:80
-NOTICED-I-CLIENT, 82.161.236.244
-NOTICED-I-URI, GET (9 bytes) /sysblog/
-NOTICED-I-SCRIPT, /sysblog/index.php sysblog:[000000]index.php (phpwasd:) SYSBLOG:[000000]index.php
-NOTICED-I-CGI, 2553595354454D2D462D41424F52542C2061626F7274 (22 bytes) %SYSTEM-F-ABORT, abort
-NOTICED-I-RXTX, err:0/0 raw:357/0 net:357/0
%HTTPD-W-NOTICED, 27-AUG-2015 06:27:52, CGI:2107, not a strict CGI response
-NOTICED-I-SERVICE, http://www.grootersnet.nl:80
-NOTICED-I-CLIENT, 82.161.236.244
-NOTICED-I-URI, GET (9 bytes) /sysblog/
-NOTICED-I-SCRIPT, /sysblog/index.php sysblog:[000000]index.php (phpwasd:) SYSBLOG:[000000]index.php
-NOTICED-I-CGI, 2553595354454D2D462D41424F52542C2061626F7274 (22 bytes) %SYSTEM-F-ABORT, abort
-NOTICED-I-RXTX, err:0/0 raw:357/0 net:357/0
%HTTPD-W-NOTICED, 27-AUG-2015 06:27:54, CGI:2107, not a strict CGI response
-NOTICED-I-SERVICE, http://www.grootersnet.nl:80
-NOTICED-I-CLIENT, 82.161.236.244
-NOTICED-I-URI, GET (18 bytes) /sysblog/index.php
-NOTICED-I-SCRIPT, /sysblog/index.php sysblog:[000000]index.php (phpwasd:) SYSBLOG:[000000]index.php
-NOTICED-I-CGI, 2553595354454D2D462D41424F52542C2061626F7274 (22 bytes) %SYSTEM-F-ABORT, abort
-NOTICED-I-RXTX, err:0/0 raw:366/0 net:366/0
%HTTPD-W-NOTICED, 27-AUG-2015 06:28:17, CGI:2107, not a strict CGI response
-NOTICED-I-SERVICE, http://www.grootersnet.nl:80
-NOTICED-I-CLIENT, 66.249.67.27
-NOTICED-I-URI, GET (16 bytes) /sysblog/?p=1095
-NOTICED-I-SCRIPT, /sysblog/index.php sysblog:[000000]index.php (phpwasd:) SYSBLOG:[000000]index.php
-NOTICED-I-CGI, 2553595354454D2D462D41424F52542C2061626F7274 (22 bytes) %SYSTEM-F-ABORT, abort
-NOTICED-I-RXTX, err:0/0 raw:357/0 net:357/0

The address 82.161.236.244 is my own. And when looking at the active processes, there is a number of WASD processes that run SYSBLOG, and some of them have the same address. Some of them show up in the list that is throttled.

Settings for the blogs show it can take some time for such a process to continue:

Throttle Report
Throttle Report

and it may be that this causes problems if one or some of the worker processes (there may be several, concurrent or in sequence) are stuck in the FIFO queue and time out. Likely scenario, given the number of entries that got into the FIFO queue; it exceeds any number I’ve seen with the older WordPress version – even with PHP 5.2-13…

This is something that definitely needs more investigation: the combination of throttle and PHP to begin with, but it is very likely caused by WordPress: What to look for are the number of processes are actually involved, what they do and how they interact. IIRC, in earlier investigations I found that WordPress will cause several WASD worker processes to be started, apparently either processing PHP code, or executing the results. If these processes depend on each other, where one waits for another to finish, and the one executing is held in the FIFO queue, of just rejected because the queue is exhausted, there is trouble.
For this, I set up my old Personal WorkStation 600au again as Daphne, after adding memory (to a whopping 512 Mb) and booted it into the cluster. Now it’s setting up the test environment (MySQL is already installed, need the same WASD, PHP and WordPress versions and setup, and data as on the main system) and see how it behaves: so there will also be a need for some software to keep track of all processes in the system.

Since it is within the cluster, I could easily refer to the real stuff as well, but execute it here. Just a matter of the right logicals and WASD mapping..
A difference in load
some days, I check the load of the previous day. Yesterday’s load was quite different than the one a few days back:

Load over26-Aug-2015
Load over26-Aug-2015

I did notice a site from Russia that constantly poked one of the WordPress files and that forced me to restart WASD (silently) on that day about 11:00 UTC, and after that, the load has been fairly even, like this one.
This ‘ user’ may have caused problems by continuously sending these requests, causing the FIFO queue to be constantly exhausted (some requests returned a 503-error – “Server unavailable” – as a result).
So yet another thing toe look at.
On the memory
I contacted the supplier on the memory issues, and he will replace the bad one (well, two of them to have the same batch in that bank). But shortly afterwards, he asked me to send him a picture of one of the chips on the DIMM:
Buffer chip
Buffer chip

Within a few hours, I got the answer:

And there is the problem.
These DIMMS (we purchased these from HP!) are bad. The buffer chips are some that were intermittently defective
causing no apparent reason for failure.
I am sending 2GB of memory. Please do not resell the memory. please destroy it and send me a photo of it crushed or damaged as it should not find its way back into the marketplace

First of all: Shame on HP. David is right: These should be removed from the circuit.
As soon as the new memory has arrived and is installed and has proven to be functioning well, these DIMMs will be destroyed. There is however one problem I have to tackle: VAT. This is a repair but customs won’t buy that. It may be circumvented by using mail, but:

I will send Fedex. I have had people hit up for large fees to with the mail service. I did notate that it was a repair return so you can tell Fedex to check their records.
You will need to reference the original fedex tracking no.

That’s to be done….(luckily I did’t remove the original mail).

24-Aug-2015

Changes in load
Something has changed: where over a day there was a continuous change in CPU usage, it changed to peak every 90 minutes or so; drop significantly and gradually increase to about 80% – an drop again:
2015-08-24_09-29-33.No idea yest where this comes from….
Changing the execution time of the PHP processes didn’t help much. But before I could add something, the blog didn’t show up; that is: it started displaying but broke halfway. So I started the admin pages and found that there were quite a lot of requests that succeeded, others stalled, or simply broke: network connections dropped, processes exited…So there is another possible cause: the throttle; intended to prevent overload, it now bites back: PHPWASD does start a number of worker processes and I already noticed that this version of wWordpress puts quite a load on the resources. So a max of 5 may be too low.
So I’ll increase that.