[Ocfs2-users] ocfs2 crash with bugs reports (dlmmaster.c)

Piotr Teodorowski piotr.teodorowski at contium.pl
Mon Feb 28 01:46:53 PST 2011


Hi,

After problem described in http://oss.oracle.com/pipermail/ocfs2-users/2010-
December/004854.html we've upgraded kernels and ocfs2-tools on every node.

The present versions are:
kernel 2.6.32-bpo.5-amd64 (from debian lenny-backports)
ocfs2-tolls 1.4.4-3 (from debian squeeze)

We didn't noticed any problems in logs untill last friday, when the whole 
ocfs2 cluster crashed.

We know that it started with some problems on node 7 (esiprap01). It reported 
o2hb_write_timeout error and it rebooted automatically.
Could you please explain what have happend with other nodes?
Some of them reported bug:
kernel BUG at 
/tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/ocfs2/dlm/dlmmaster.c:241!
one of them (es1prap03 - node 4) reported:
kernel BUG at 
/tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/ocfs2/dlm/dlmmaster.c:3260!

We've had a problem to start the claster again. While one node was starting 
the other crashed (logged some stack strace - see attachments, and rebooted). 
The only way to start the claster was stop almost all nodes and start them one 
by one.

We didn't find what caused problem with the first node (node 7), we don't 
expect tha we will find it out. Propably it wasn't hardware problem. The 
sotrage was responsible, we don't have any errors in storage event log.
The question is why the other nodes crashed.

The configuration is the same as it was in december (cluster.conf).

Regards,
Piotr Teodorowski
-------------- next part --------------
node:
	ip_port = 7777
	ip_address = 172.28.4.48
	number = 0
	name = es1prgw01
	cluster = ocfs2

node:
	ip_port = 7777
	ip_address = 172.28.4.56
	number = 1
	name = es4prgw01
	cluster = ocfs2

node:
	ip_port = 7777
	ip_address = 172.28.4.65
	number = 3
	name = es1prap02
	cluster = ocfs2

node:
	ip_port = 7777
	ip_address = 172.28.4.66
	number = 4
	name = es1prap03
	cluster = ocfs2

node:
	ip_port = 7777
	ip_address = 172.28.4.80
	number = 5
	name = es4prap01
	cluster = ocfs2

node:
	ip_port = 7777
	ip_address = 172.28.4.81
	number = 6
	name = es4prap02
	cluster = ocfs2

node:
	ip_port = 7777
	ip_address = 172.28.4.64
	number = 2
	name = es1prap01
	cluster = ocfs2

node:
	ip_port = 7777
	ip_address = 172.28.4.78
	number = 7
	name = esiprap01
	cluster = ocfs2

node:
	ip_port = 7777
	ip_address = 172.28.4.67
	number = 8
	name = es1prap04
	cluster = ocfs2

node:
	ip_port = 7777
	ip_address = 172.28.4.68
	number = 9
	name = es1prap05
	cluster = ocfs2

cluster:
	node_count = 10
	name = ocfs2

-------------- next part --------------
A non-text attachment was scrubbed...
Name: netconsole.tgz
Type: application/x-compressed-tar
Size: 55465 bytes
Desc: not available
Url : http://oss.oracle.com/pipermail/ocfs2-users/attachments/20110228/2475c133/attachment-0002.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: messages.tgz
Type: application/x-compressed-tar
Size: 183445 bytes
Desc: not available
Url : http://oss.oracle.com/pipermail/ocfs2-users/attachments/20110228/2475c133/attachment-0003.bin 


More information about the Ocfs2-users mailing list