<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#ffffff" text="#000000">

    Thanks for the bug report. Please can you file a bz and attach<br>

    the all the message files. Yes the problem started with the hb<br>

    timeout in esiprap01. The problem spread to other nodes possibly<br>

    because of a race in migration. A bz will help us track the issue<br>

    more easily.<br>

    <br>

    On 02/28/2011 01:46 AM, Piotr Teodorowski wrote:

    <blockquote

      cite="mid:201102281046.53459.piotr.teodorowski@contium.pl"

      type="cite">

      <pre wrap="">Hi,

After problem described in <a class="moz-txt-link-freetext" href="http://oss.oracle.com/pipermail/ocfs2-users/2010">http://oss.oracle.com/pipermail/ocfs2-users/2010</a>-

December/004854.html we've upgraded kernels and ocfs2-tools on every node.

The present versions are:

kernel 2.6.32-bpo.5-amd64 (from debian lenny-backports)

ocfs2-tolls 1.4.4-3 (from debian squeeze)

We didn't noticed any problems in logs untill last friday, when the whole 

ocfs2 cluster crashed.

We know that it started with some problems on node 7 (esiprap01). It reported 

o2hb_write_timeout error and it rebooted automatically.

Could you please explain what have happend with other nodes?

Some of them reported bug:

kernel BUG at 

/tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/ocfs2/dlm/dlmmaster.c:241!

one of them (es1prap03 - node 4) reported:

kernel BUG at 

/tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/ocfs2/dlm/dlmmaster.c:3260!

We've had a problem to start the claster again. While one node was starting 

the other crashed (logged some stack strace - see attachments, and rebooted). 

The only way to start the claster was stop almost all nodes and start them one 

by one.

We didn't find what caused problem with the first node (node 7), we don't 

expect tha we will find it out. Propably it wasn't hardware problem. The 

sotrage was responsible, we don't have any errors in storage event log.

The question is why the other nodes crashed.

The configuration is the same as it was in december (cluster.conf).

Regards,

Piotr Teodorowski

</pre>

      <pre wrap="">

<fieldset class="mimeAttachmentHeader"></fieldset>

_______________________________________________

Ocfs2-users mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Ocfs2-users@oss.oracle.com">Ocfs2-users@oss.oracle.com</a>

<a class="moz-txt-link-freetext" href="http://oss.oracle.com/mailman/listinfo/ocfs2-users">http://oss.oracle.com/mailman/listinfo/ocfs2-users</a></pre>

    </blockquote>

    <br>

  </body>

</html>