[Ocfs2-users] ocfs2 crash with bugs reports (dlmmaster.c)

Piotr Teodorowski piotr.teodorowski at contium.pl
Tue Mar 1 04:28:35 PST 2011


Thanks for quick response,
the bug:
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1319

Regards,
Piotr Teodorowski

On Tuesday 01 of March 2011 02:55:01 Sunil Mushran wrote:
> Thanks for the bug report. Please can you file a bz and attach
> the all the message files. Yes the problem started with the hb
> timeout in esiprap01. The problem spread to other nodes possibly
> because of a race in migration. A bz will help us track the issue
> more easily.
> 
> On 02/28/2011 01:46 AM, Piotr Teodorowski wrote:
> > Hi,
> >
> > After problem described in
> > http://oss.oracle.com/pipermail/ocfs2-users/2010- December/004854.html
> > we've upgraded kernels and ocfs2-tools on every node.
> >
> > The present versions are:
> > kernel 2.6.32-bpo.5-amd64 (from debian lenny-backports)
> > ocfs2-tolls 1.4.4-3 (from debian squeeze)
> >
> > We didn't noticed any problems in logs untill last friday, when the whole
> > ocfs2 cluster crashed.
> >
> > We know that it started with some problems on node 7 (esiprap01). It
> > reported o2hb_write_timeout error and it rebooted automatically.
> > Could you please explain what have happend with other nodes?
> > Some of them reported bug:
> > kernel BUG at
> > /tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/ocfs2/dlm/
> >dlmmaster.c:241! one of them (es1prap03 - node 4) reported:
> > kernel BUG at
> > /tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/ocfs2/dlm/
> >dlmmaster.c:3260!
> >
> > We've had a problem to start the claster again. While one node was
> > starting the other crashed (logged some stack strace - see attachments,
> > and rebooted). The only way to start the claster was stop almost all
> > nodes and start them one by one.
> >
> > We didn't find what caused problem with the first node (node 7), we don't
> > expect tha we will find it out. Propably it wasn't hardware problem. The
> > sotrage was responsible, we don't have any errors in storage event log.
> > The question is why the other nodes crashed.
> >
> > The configuration is the same as it was in december (cluster.conf).
> >
> > Regards,
> > Piotr Teodorowski
> >
> >
> > _______________________________________________
> > Ocfs2-users mailing list
> > Ocfs2-users at oss.oracle.com
> > http://oss.oracle.com/mailman/listinfo/ocfs2-users
> 



More information about the Ocfs2-users mailing list