[Ocfs2-users] Panic using ocsf2

Jan-Marc Pilawa j.pilawa at tu-bs.de
Fri Feb 18 06:48:18 CST 2005


Hello List, 

i ran into a problem with ocfs2 in a 2 node cluster. (but i suspect, 
that it is my fault due to a wrong configuration... maybe....)

I am using: 
========
SLES9 on ibm xseries 235 with shared storage in SAN
with following configuration: 

kernel 2.6.5-7.108-bigsmp
ocfs2-tools-0.99.0-1
ocfs2-support-0.99.0-1
ocfs2-2.6.5-7.108-bigsmp-0.99.2-1


Node1:  /etc/ocfs.conf:
ip_address = 192.168.9.45
ip_port_v2 = 63000
node_name = node1
comm_voting = 1
        guid = 0513A9FA0665DA21BE130002B3C76992

Node2: /etc/ocfs.conf:
ip_address = 192.168.9.43
ip_port_v2 = 63000
node_name = node2
comm_voting = 1
        guid = 0513A9FA0665DA21BE13001018038207


What I did:
========
/etc/init.d/ocfs2 start; mount -t ocfs /dev/sdd1 /ocfs 
worked fine on both machines. I saw the filesystem on both 
machines and could copy files to and from the filesystem. I 
tried to stress the filesystem a little bit. By Accident i tried 
to write to the same file from both nodes. This may have 
caused the problems. While node2 hang completely /var/log/messages 
on node1 told the following:

 kernel: (15943) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/heartbeat.c, 204: warning: rzsdb5 (node 1) may be ejected from cluster on device (8.48)... 20 misses so far
 kernel: (11) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/vote.c, 504: bad message: vote_state=0 type=1 lockid=10612113408 expected=10612113408
 kernel: (11) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/vote.c, 575: status = -22
 kernel: (16841) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/vote.c, 893: inode 2590848, vote_status=0, vote_state=1, lockid=10612113408, flags = 0x5, asked type = 5 master = 0, state = 0x0, type = 5
 kernel: (16841) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/dlm.c, 384: Timed out acquiring lock for inode 2590848, (lockid = 10612113408) retrying...
 kernel: (16849) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/alloc.c, 3983: status = -28
 kernel: (16849) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/journal.c, 727: block 2590849 was modified but never dirtied!
 kernel: (16849) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/journal.c, 727: block 9 was modified but never dirtied!
 kernel: (16882) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/heartbeat.c, 204: warning: rzsdb5 (node 1) may be ejected from cluster on device (8.48)... 20 misses so far
 kernel: (16882) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/heartbeat.c, 204: warning: rzsdb5 (node 1) WILL BE EJECTED from cluster on device (8.48)... 40 misses so far
 kernel: (16882) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/heartbeat.c, 204: Removing rzsdb5 (node 1) from clustered device (8,48) after 60 misses
 kernel: (16881) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/vote.c, 893: inode 27, vote_status=0, vote_state=1, lockid=110592, flags = 0x101, asked type = 5 master = 1, state = 0x0, type = 5
 kernel: (16881) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/dlm.c, 384: Timed out acquiring lock for inode 27, (lockid = 110592) retrying...

After rebooting/resetting the nodes I am unable to mount /ocfs on both nodes.
I fiddled around with this Problem, but can't figure out whats wrong. 

Node1 works fine. But on Node2 mount -t ocfs2 /dev/sdd1 /ocfs complains with

mount: Unknown error 999

 kernel: max_nodes for this device: 2
 kernel: clusterbits=12
 kernel: vol_label: 
 kernel: uuid: b3 c4 dc df 55 f6 87 b9 b2 1d 93 39 ae ea 75 bd 
 kernel: root_blkno=3, system_dir_blkno=4
 kernel: autoconfig: blkno=330, blocks=4 newblkno=334 newblocks=4
 kernel: publish: blkno=338, blocks=2
 kernel: vote: blkno=340, blocks=2
 kernel: bitmap_blkno=433, bitmap_blocks=150, num_clusters=4882676
 kernel: (5151) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/volcfg.c, 799: Re-mount volume with the reclaimid option to reclaim the node number
 kernel: (5151) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/volcfg.c, 836: status = -999
 kernel: (5151) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/super.c, 1654: status = -999
 kernel: (5151) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/super.c, 979: status = -999

If I am doing a 

/etc/init.d/ocfs stop|start

I get a kernel panic.

---------

kernel: ocfs2: unsupported module, tainting kernel.
kernel: Oracle Cluster FileSystem 2 Mon Sep 13 17:26:09 PDT 2004 (build f96fe36998cba3997ad37e17bcb49b04)
kernel: ocfs2: hostname is rzsdb5
kernel: max_nodes for this device: 2
kernel: clusterbits=12
kernel: vol_label: 
kernel: uuid: d6 bb 8c c9 d4 3e 29 d8 db 42 2a e9 d3 c5 a2 49 
kernel: root_blkno=3, system_dir_blkno=4
kernel: autoconfig: blkno=330, blocks=4 newblkno=334 newblocks=4
kernel: publish: blkno=338, blocks=2
kernel: vote: blkno=340, blocks=2
kernel: bitmap_blkno=433, bitmap_blocks=150, num_clusters=4882676
kernel: (5104) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/volcfg.c, 799: Re-mount volume with the reclaimid option to reclaim the node number
kernel: (5104) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/volcfg.c, 836: status = -999
kernel: (5104) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/super.c, 1654: status = -999
kernel: (5104) ERROR at /usr/src/packages/BUILD/ocfs2-0.99.2/src/super.c, 979: status = -999
-- MARK --
kernel: slab error in kmem_cache_destroy(): cache `ocfs2_inode': Can't free all objects
kernel: Call Trace:
kernel:  [<c01521b4>] kmem_cache_destroy+0xe4/0x130
kernel:  [<f936e3fa>] ocfs_free_mem_lists+0xa/0x26 [ocfs2]
kernel:  [<f93751fa>] ocfs_driver_exit+0x12a/0x15f [ocfs2]
kernel:  [<c013bc94>] kthread_stop+0x64/0x90
kernel:  [<c0143b0d>] sys_delete_module+0x18d/0x220
kernel:  [<c0109199>] sysenter_past_esp+0x52/0x71
kernel: 
kernel: slab error in kmem_cache_destroy(): cache `ocfs2_extent': Can't free all objects
kernel: Call Trace:
kernel:  [<c01521b4>] kmem_cache_destroy+0xe4/0x130
kernel:  [<f936e404>] ocfs_free_mem_lists+0x14/0x26 [ocfs2]
kernel:  [<f93751fa>] ocfs_driver_exit+0x12a/0x15f [ocfs2]
kernel:  [<c013bc94>] kthread_stop+0x64/0x90
kernel:  [<c0143b0d>] sys_delete_module+0x18d/0x220
kernel:  [<c0109199>] sysenter_past_esp+0x52/0x71
kernel: 
kernel: Unloaded OCFS Driver module
kernel: ocfs2: unsupported module, tainting kernel.
kernel: Oracle Cluster FileSystem 2 Mon Sep 13 17:26:09 PDT 2004 (build f96fe36998cba3997ad37e17bcb49b04)
kernel: ocfs2: hostname is rzsdb5
kernel: kmem_cache_create: duplicate cache ocfs2_inode
kernel: ------------[ cut here ]------------
kernel: kernel BUG at mm/slab.c:1348!
kernel: invalid operand: 0000 [#1]
kernel: SMP 
kernel: CPU:    3
kernel: EIP:    0060:[<c0151f97>]    Tainted: G  U
kernel: EFLAGS: 00010202   (2.6.5-7.108-bigsmp) 
kernel: EIP is at kmem_cache_create+0x467/0x5a0
kernel: eax: 0000002f   ebx: f5ff1a48   ecx: c04b2610   edx: 00007098
kernel: esi: f9378286   edi: f9378286   ebp: f7131a50   esp: f4651f44
kernel: ds: 007b   es: 007b   ss: 0068
kernel: Process modprobe (pid: 5434, threadinfo=f4650000 task=f61e06b0)
kernel: Stack: c034a48c f937827a 0000001a 0000000a c0000000 ffffff80 00000080 00000180 
kernel:        f937827a 00000080 00000000 00000000 00000000 c0391510 f900f437 00003000 
kernel:        00000000 00000000 0805a110 c039153c f9384180 4013b008 c0143502 00000013 
kernel: Call Trace:
kernel:  [<f900f437>] ocfs_driver_entry+0x437/0x82f [ocfs2]
kernel:  [<c0143502>] sys_init_module+0x152/0x290
kernel:  [<c0109199>] sysenter_past_esp+0x52/0x71
kernel: 
kernel: Code: 0f 0b 44 05 f8 9c 34 c0 eb cc b8 00 e0 ff ff 21 e0 8b 58 10 

------

If I am mounting the device with

mount -t ocfs2 -o reclaimid /dev/sdd1 /ocfs

it gets mounted, but an attempt of unmounting it leads to a kernel panic
on the other machine... Is it so easy to crash ocfs2 at the moment or am
i doing something completely wrong?


Mit freundlichen Gruessen / Sincerely

Jan Pilawa

-- 
+ Kontakt ----------------------------------------------------+
+ Systembetreuung Rechenzentrum TU Braunschweig 
+ Hans-Sommer-Str. 65, D-38106 Braunschweig 
+ Tel: +49 531 391-5548 E-Mail: j.pilawa at tu-bs.de 



More information about the Ocfs2-users mailing list