[Ocfs2-users] ocfs2 crash with bugs reports (dlmmaster.c)
Piotr Teodorowski
piotr.teodorowski at contium.pl
Mon Feb 28 01:46:53 PST 2011
Hi,
After problem described in http://oss.oracle.com/pipermail/ocfs2-users/2010-
December/004854.html we've upgraded kernels and ocfs2-tools on every node.
The present versions are:
kernel 2.6.32-bpo.5-amd64 (from debian lenny-backports)
ocfs2-tolls 1.4.4-3 (from debian squeeze)
We didn't noticed any problems in logs untill last friday, when the whole
ocfs2 cluster crashed.
We know that it started with some problems on node 7 (esiprap01). It reported
o2hb_write_timeout error and it rebooted automatically.
Could you please explain what have happend with other nodes?
Some of them reported bug:
kernel BUG at
/tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/ocfs2/dlm/dlmmaster.c:241!
one of them (es1prap03 - node 4) reported:
kernel BUG at
/tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/ocfs2/dlm/dlmmaster.c:3260!
We've had a problem to start the claster again. While one node was starting
the other crashed (logged some stack strace - see attachments, and rebooted).
The only way to start the claster was stop almost all nodes and start them one
by one.
We didn't find what caused problem with the first node (node 7), we don't
expect tha we will find it out. Propably it wasn't hardware problem. The
sotrage was responsible, we don't have any errors in storage event log.
The question is why the other nodes crashed.
The configuration is the same as it was in december (cluster.conf).
Regards,
Piotr Teodorowski
-------------- next part --------------
node:
ip_port = 7777
ip_address = 172.28.4.48
number = 0
name = es1prgw01
cluster = ocfs2
node:
ip_port = 7777
ip_address = 172.28.4.56
number = 1
name = es4prgw01
cluster = ocfs2
node:
ip_port = 7777
ip_address = 172.28.4.65
number = 3
name = es1prap02
cluster = ocfs2
node:
ip_port = 7777
ip_address = 172.28.4.66
number = 4
name = es1prap03
cluster = ocfs2
node:
ip_port = 7777
ip_address = 172.28.4.80
number = 5
name = es4prap01
cluster = ocfs2
node:
ip_port = 7777
ip_address = 172.28.4.81
number = 6
name = es4prap02
cluster = ocfs2
node:
ip_port = 7777
ip_address = 172.28.4.64
number = 2
name = es1prap01
cluster = ocfs2
node:
ip_port = 7777
ip_address = 172.28.4.78
number = 7
name = esiprap01
cluster = ocfs2
node:
ip_port = 7777
ip_address = 172.28.4.67
number = 8
name = es1prap04
cluster = ocfs2
node:
ip_port = 7777
ip_address = 172.28.4.68
number = 9
name = es1prap05
cluster = ocfs2
cluster:
node_count = 10
name = ocfs2
-------------- next part --------------
A non-text attachment was scrubbed...
Name: netconsole.tgz
Type: application/x-compressed-tar
Size: 55465 bytes
Desc: not available
Url : http://oss.oracle.com/pipermail/ocfs2-users/attachments/20110228/2475c133/attachment-0002.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: messages.tgz
Type: application/x-compressed-tar
Size: 183445 bytes
Desc: not available
Url : http://oss.oracle.com/pipermail/ocfs2-users/attachments/20110228/2475c133/attachment-0003.bin
More information about the Ocfs2-users
mailing list