30-Jan-2012

More on the wiki
The main issue found yesterday was in a commandfile that is started by the server – it refers to an object or method that does no longer exist. So chances were there would be a new version with the new Pthon version.
And so there was: a template is supplied! So I copied the file to the location the wiki maps to and where the file is expected to reside.
Now it takes a while to show a next error:

ERROR 502 - External agent did not respond (or not acceptably)

Traceback (most recent call last):
File "/moin_root/MoinMoin/wsgiapp.py", line 284, in __call__
response = run(context)
File "/moin_root/MoinMoin/wsgiapp.py", line 88, in run
response = dispatch(request, context, action_name)
File "/moin_root/MoinMoin/wsgiapp.py", line 138, in dispatch
response = handle_action(context, pagename, action_name)
File "/moin_root/MoinMoin/wsgiapp.py", line 197, in handle_action
handler(context.page.page_name, context)
File "/moin_root/MoinMoin/action/__init__.py", line 268, in do_show
content_only=content_only,
File "/moin_root/MoinMoin/Page.py", line 1198, in send_page
start_line=pi['lines'])
File "/moin_root/MoinMoin/Page.py", line 1292, in send_page_content
self.execute(request, parser, code)
File "/moin_root/MoinMoin/Page.py", line 1323, in execute
exec code
File "StartingPage", line 2, in
TypeError: 'dict' object is not callable

This is something else – inside the wiki code, as it seems.

It could have been a matter of the RunTimeEnvironment – so I got the latest from the WASD site, built and installed it. But it doesn’t make a difference. And according the message there is something within the wiki code or one of the pages…..

For a change, the WATCH output is of little help, it just shows the right mapping and the error as shown. ime to ask the WASD list….

29-Jan-2012

Updates
Three updates to be done, without much of a problem. At least, that’s anticipated.
Mysql
I’m running the release by Jean-François Piéronne, currently version 5.1.23, which has been released quite some time ago. Latest version as today is 5.1.46; same level but probably enhanced. So I updated the database, following the recommendations of the wiki on his site. There is a small typo in this document, but if you know VMS, it’s not much of a problem.
After restart, you need to update a number of tables but the script runs into a number of “Column already exists” errors. I don’t think it matters much, since the database runs fine after that.
The first attempt to access the blogs using IE8 failed but that may have been caused by running PHP-processes that lost connection the database (obviously the database needed to be stopped…)
Anyway, MySQL is now up-to-date.
Python and related
Second there are the Python updates. These come as logical devices: containers conta8ng all files, to be mounted as disks: one containing the libraries, and one containing the running software – one of them being the MoinMoin wiki. That update requires logical disks to be redefined, and some more logicals to be redefined.
This has all been taken care of, but for some reason, something goes wrong in running the WASD-procedure that starts the wiki:

ERROR 502 - External agent did not respond (or not acceptably)

Traceback (most recent call last):
File "sys$input", line 12, in

ImportError: No module named server.server_wsgi

I found the cause in the startup-file:

$ proc = f$environment("PROCEDURE")
$ proc_dir = f$parse(proc,,,"DEVICE") + f$parse(proc,,,"DIRECTORY")
$ @'proc_dir'logicals
$ @python_vms:setup
$ define PYTHON_FILE_SHARING 1
$ mcr cgi_exe:pyrte sys$input

# Debug mode - show detailed error reports
import os
os.environ['MOIN_DEBUG'] = '1'

import sys
import wasd

sys.path.insert(0, '/moin_root/')
sys.path.insert(0,'/moin_wiki_root/mywiki/')

from MoinMoin.server.server_wsgi import moinmoinApp, WsgiConfig < ---- Line 12 class Config(WsgiConfig): logPath = '/moin_wiki_root/logs/moin.log' # adapt to your needs! #loglevel_file = logging.INFO # adapt if you don't like the default config = Config() # MUST create an instance to init logging if __name__ == '__main__': while wasd.cgiplus_begin(): wasd.wsgi_run(moinmoinApp) $ $

There must be something changed internally???
I remember there is something in the WASD mailing list on the matter, so I'll have to dig the archives. But after I have removed all bogus accounts (prior to the installation...) there is little chance they have the opportunity to retry 🙂

So for the time being, the VMS-wiki won't start when acessed. Sorry for that.

13-Jan-2012

DoS attacks on blogs – part 2
One of the things I did after the server was restarted, was to define s throttle on the blogs, the wiki and the download area. This wioll oprevent this number of concurrent accesses to them, limiting the risks. It may mean that when I get a lot of requests, some will have to wait a bit longer, or will be queued for some time, or get a “Server busy” error. The issue is clearly visisble in yesterday’s history. Not so much in CPU – there are some spikes up to 25%:

At times there have been peaks in CPU, normally it’s just a few\percent, these spikes up to 25 are remarkable, esepcially in the timeframe in which they occur…
In memory usage – especially pagefile usage – the problem is clear:

Free memory is exhausted, but there is space enough avaliable in the pagefile, and far more cqan be paged into the files; but since none of the processes clearly ran out of virtual memory, there is something else that blocked processing. The only culprit in that case, might be MySQL….
The number of processes:

runs in to the roof – in steps, and it follows memory usage: starting to rise at six to stablize just a few minutes later, until the numbert of processes increases again at 7, again stabilizes at 9, and again increasing in steps until the system of out of slots at 10; in the next half hour, is seems some processes com to an end but immediately, that free slot is taken again….From that moment on, until the system is rebootes at 21:00, no new processes can be created; once in a while one ends, but another will take the slot immedeately…
Paging show2s a massive peak at 6:00 and 7:00.

These will be the moments that the large amount of processes are created. It matches the graph seen yesterday on the amount of requests on these moments.
The graph of buffered IO shows the same peaks:

So the problem started at 6:00, stabalized until 7:00, and then contibued until the system ran out of resources at about 10:30 (local time, which is UTC without DST on the system).

Armed with these data, I searched the access log.
At 4:00, 66.249.66.186 requested a few pages in the SYSMGR blog, but there were a few minutes between them and that is hardly noticble. Nothing weird, actually, this happens more often.
But at 6:00, 204.11.219.95 kicks in. Within 16 minutes this address fires 50 subsequent GETs on the Tracks blog index; the fist 20 succee; he get’s 503 errors on the next 5; the next 5 succeed, the next 6 result in 502 errors followed by succeeding.
About 15 minutes later, at 6:39, addresses 66.249.66.186 and 66.249.71.152 start accessing the SYSMGR blog index with straight requests. Several other addesses do the same for RSS feeds, others scanning for monthly indices to be displayed. Not that often – verey few minutes, but continuously. Most, of course, ending in a 500-style error after 10:30….

These 66.xxx.xxx.xxx addresses are owned by Google, and so do some of the RSS feeds. This is no big deal, it’s rather nomal. These come with a larger interval. The real culprit seems to be 204.11.219.95:
$ dig -x 204.11.219.95

; < <>> DiG 9.3.1 < <>> -x 204.11.219.95
;; global options: printcmd
;; Got answer:
;; ->>HEADER< <- opcode: QUERY, status: NXDOMAIN, id: 6130
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;95.219.11.204.in-addr.arpa. IN PTR

;; ANSWER SECTION:
95.219.11.204.in-addr.arpa. 43200 IN CNAME 95.219.11.204.in-addr.networkvirtue.com.

;; AUTHORITY SECTION:
networkvirtue.com. 2560 IN SOA a.ns.networkvirtue.com. hostmaster.networkvirtue.com. 1305226632 16384 2048 1048576
2560

;; Query time: 5843 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Jan 13 13:10:14 2012
;; MSG SIZE rcvd: 149

$
$ whois 204.11.219.95
----Server: whois.arin.net [AMERICAS] response for 204.11.219.95
#
# Query terms are ambiguous. The query is assumed to be:
# "n 204.11.219.95"
#
# Use "?" to get help.
#

#
# The following results may also be obtained via:
# http://whois.arin.net/rest/nets;q=204.11.219.95?showDetails=true&showARIN=false&ext=netref2
#

Peak Web Hosting Inc. PEAK-WEB-HOSTING (NET-204-11-216-0-1) 204.11.216.0 - 204.11.223.255
Gal Halevy GAL-HALEVY-NETWORK (NET-204-11-219-64-1) 204.11.219.64 - 204.11.219.127

#
# ARIN WHOIS data and services are subject to the Terms of Use
# available at: https://www.arin.net/whois_tou.html
#

So I know where to signal abuse.
By the way: Mail reception stalled as well:

%%%%%%%%%%% OPCOM 12-JAN-2012 10:14:27.38 %%%%%%%%%%%
Message from user INTERnet on DIANA
INTERnet ACP SMTP Abort Request from Host: 192.168.0.2 Port: 54120

%%%%%%%%%%% OPCOM 12-JAN-2012 10:14:27.38 %%%%%%%%%%%
Message from user INTERnet on DIANA
INTERnet ACP AUXS failure Status = %SYSTEM-F-NOSLOT

Same problem….

12-Jan-2012

DoS attack on blogs
This morning, the web was unreliable in both speed of access and success. Webmail, which normally works like a charm, would react slowly, cause a 503 error, or time-out. The same to the opertion desk: the home page is a plain HTML file – no withles nor bells. That would load fine, although slow, but access to functions within the menu could cause similar problems.
Luckily, the WASD web-pages were acecsable.
Looking at what could cause the problems, I looked at the activity on the system, and I notices a really large number of PHP-server processes: meaning that someone was trying to blow the blogs to pieces.
So the next action was to stop these processes, but it seemed an impossible task: for any process I killed, another one was created. Or processes were said to be ‘suspended’…
Next stop: restart the webserver – which normally causes all pending PHP-servers to disappear. But not so this time; that is: the server list of running processes showed them gone, but SHOW SYSTEM still had them….So I retried – wityh no luck.
At some point, an error 500 (unexpected server error) was returned whenever a rfequest was send that would the webserver require to create a new process; but since the admin apges are handled internally, WATCH could show me the reason: “no pcb available”. The system was simply out of gas….But not completely, wherever the webserver could handle requests itself, like the admin pages, the images beyond the Trips,Tracks&Travels blog, or download files, that worked as before. Also mail and other processes normally running kept running as usual; a bit slower, perhaps.
It was not until later in the afternoon that I had the ability to solve the problem, because of this lack of process blocks, login wasn’t possible either – I just had to work from the console.
To my luck, the DecWindows session on my console was still up and running, so from there, I could try to clean up the mess. Each slot tahat would normally be open, was now occupied by a ser4ver-subprocess running PHP, in either LEF or LEFO state. So I stopped each of them. Next re-showing what was running, the processes re-appeared, to I tried again – with no result.
The only solution I had than was to reboot the server. After that, the webs worked like they should.

Next thing is to examine what happened….

05-Jan-2012

More bogus users
Itś been quiet for some time, but abusers seem to have taken up steam. Today alone, I had to remove several bogus accounts on the VMS wiki. This blog seems to be popular as well, here as well these are simply deleted. These attempts are easy to determine: bogus domains (the wiki doesn’t check) or usernames. One way to prevent these on the blog is disallowing creating accounts without intervention, perhaps the latest version has a facility to mail a generated password to the email address specified by the user – both wiki and blog can be ‘secured’ that way….
Working on the new web-configuration doesn’t proceed as wanted. Mainly because the site is not accessible from the Internet, although things have been set up in such a way it should simply work. At least, it looks Ok. From the LAN I didn’t have a problem, I could access both ports: http and https,the latter allowing the admin pages – which allows real-time logging of the mapping. But from the outside, hhtp signals a “Not allowed ” message – sent by = the server – and https fails altogether, so I couldn’t check why this happened.. Posted it to the WASD mailing list – there is a tool I could use: HTTPDMON even when the admin pages are not available. So thatś for this weekend….