10-Aug-2015

Memory in error
Because heat may contribute to the problems, I reorganized the installation.
I had the DS10 under the HSZ50 and three BA356 cabinets; these have their airflow directed downward so that may block hot air from the server to escape.
So I turned the units upside-down so they now blow upward. And I gave the DS10 a bit more room around.
Second, I set console to serial – I already have the HSZ50 hooked up to a VT420 terminal, the server could be hooked to the second session. That shows a bit more information on boot.

Did A memtest – it now worked where it, for some reason, refused because 0 wasn’t a right address… But now it did. I did a test over a large portion of memory and again, the system froze. Not even CTRL-P on the console, or the HALT button gave a response.

Restarted – and memtest failed again on the same message..
Restarted again – and memtest did run. tested to the max possible, no errors.
Did sho mem, several times – and the system froze again.
Restarted, sho mem gave no problems over a longer period – so I booted. All seemed Ok, until about 30 minutes, the system froze again.

There is definitely something wrong with this memory. My guess: one of the DIMMs breaks the system, the question is which one?
There isn’t any message – nowhere – that gives a hint of memory problems. and once VSM is running, there is no way to get this information. I could, of course, write a program that checks pages on the fly but that would cause havoc on performance. Plus that the kernel will intervene to prevent problems with system data and structures – what this program wants to detect…

In the end, I decided to remove the 2Gb memory and return to 512, for the time being. I could, of course, add 1 Gb at a time and see if that works. But only when I’m at the site…

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.