[Ocfs2-users] dlm problem

Henrik Carlqvist hc11 at poolhem.se
Thu Oct 30 11:42:37 PDT 2008


Hi!

I have a 2-node ocfs2-cluster working as an active-active HA-NFS server.
At a few times disc access has hung on some ocfs2 file system. When I now
got back from a vacation the systems had hung again during my absence, but
fortunately another sysadmin were able to bring the systems back up by
rebooting and running fsck.ocfs2.

This time I was able to find some entries in the log files which might
explain why a file system hangs. The two nodes are named lejonapa and
kattapa. The relevant parts of the logs are as follows (sorry about the
wrapped long lines):

Oct 20 05:28:49 lejonapa kernel: (3252,3):dlm_deref_lockres_handler:2337
ERROR: D7E57FB7475045C49538F2B4A6307E54:N00000000000101da: bad lockres
name

Oct 20 05:28:49 kattapa kernel: (3480,1):dlm_print_one_lock_resource:50
lockres: N00000000000101da, owner=0, state=64
Oct 20 05:28:49 kattapa kernel: (3480,1):__dlm_print_one_lock_resource:82
lockres: N00000000000101da, owner=0, state=64
Oct 20 05:28:49 kattapa kernel: (3480,1):__dlm_print_one_lock_resource:84 
 last used: 662639943, on purge list: yes
Oct 20 05:28:49 kattapa kernel: (3480,1):dlm_print_lockres_refmap:61  
refmap nodes: [ ], inflight=0
Oct 20 05:28:49 kattapa kernel: (3480,1):__dlm_print_one_lock_resource:86 
 granted queue: 
Oct 20 05:28:49 kattapa kernel: (3480,1):__dlm_print_one_lock_resource:101
  converting queue: 
Oct 20 05:28:49 kattapa kernel: (3480,1):__dlm_print_one_lock_resource:116
  blocked queue: 

Oct 20 05:28:49 kattapa kernel: (3480,1):dlm_drop_lockres_ref:2292 ERROR:
while dropping ref on D7E57FB7475045C49538F2B4A6307E54:N00000000000101da
(master=0) got -22.
Oct 20 05:28:49 kattapa kernel: ------------[ cut here ]------------
Oct 20 05:28:49 kattapa kernel: kernel BUG at
fs/ocfs2/dlm/dlmmaster.c:2294!
Oct 20 05:28:49 kattapa kernel: invalid opcode: 0000 [#1]
Oct 20 05:28:49 kattapa kernel: SMP 
Oct 20 05:28:49 kattapa kernel: Modules linked in: autofs nfsd exportfs
ipv6 pcmcia pcmcia_core capability commoncap agpgart lp parport_pc parport
pcspkr psmouse e1000 iTCO_wdt iTCO_vendor_support shpchp ata_g
eneric serio_raw sg evdev
Oct 20 05:28:49 kattapa kernel: CPU:    1
Oct 20 05:28:49 kattapa kernel: EIP:    0060:[<c036b595>]    Not tainted
VLI
Oct 20 05:28:49 kattapa kernel: EFLAGS: 00010282   (2.6.21.5-smp #3)
Oct 20 05:28:49 kattapa kernel: EIP is at dlm_drop_lockres_ref+0x1c5/0x280
Oct 20 05:28:49 kattapa kernel: eax: dd495300   ebx: f7409a00   ecx:
fffffd7b   edx: dd4953e8
Oct 20 05:28:49 kattapa kernel: esi: c06f60f2   edi: 000008f4   ebp:
0000001f   esp: f0473e60
Oct 20 05:28:49 kattapa kernel: ds: 007b   es: 007b   fs: 00d8  gs: 0000 
ss: 0068
Oct 20 05:28:49 kattapa kernel: Process dlm_thread (pid: 3480, ti=f0472000
task=f5232570 task.ti=f0472000)
Oct 20 05:28:49 kattapa kernel: Stack: c07c661c 00000d98 00000001 c06f60f2
000008f4 f4609780 0000001f f29ef8e0 
Oct 20 05:28:49 kattapa kernel:        00000000 ffffffea dd4953c0 f7409a00
f29ef8e0 00000000 1f010000 3030304e 
Oct 20 05:28:49 kattapa kernel:        30303030 30303030 64313031 00000061
10000000 0000b536 00000000 00000000 
Oct 20 05:28:49 kattapa kernel: Call Trace:
Oct 20 05:28:49 kattapa kernel:  [<c035e877>]
dlm_run_purge_list+0x1f7/0x430
Oct 20 05:28:49 kattapa kernel:  [<c0129c18>]
try_to_del_timer_sync+0x48/0x50
Oct 20 05:28:49 kattapa kernel:  [<c0129c2e>] del_timer_sync+0xe/0x20
Oct 20 05:28:49 kattapa kernel:  [<c06de022>] schedule_timeout+0x52/0xd0
Oct 20 05:28:49 kattapa kernel:  [<c035ec76>] dlm_thread+0x56/0xf60
Oct 20 05:28:49 kattapa kernel:  [<c0133c60>]
autoremove_wake_function+0x0/0x50
Oct 20 05:28:49 kattapa kernel:  [<c035ec20>] dlm_thread+0x0/0xf60
Oct 20 05:28:49 kattapa kernel:  [<c0133aab>] kthread+0xbb/0xf0
Oct 20 05:28:49 kattapa kernel:  [<c01339f0>] kthread+0x0/0xf0
Oct 20 05:28:49 kattapa kernel:  [<c0103693>]
kernel_thread_helper+0x7/0x14
Oct 20 05:28:49 kattapa kernel:  =======================
Oct 20 05:28:49 kattapa kernel: Code: 89 74 24 0c 89 54 24 08 89 44 24 14
8b 81 a4 00 00 00 c7 04 24 1c 66 7c c0 89 44 24 04 e8 14 5f db ff 8b 44 24
28 e8 5b 2b ff ff <0f> 0b eb fe 8d b4 26 00 00 00 00 3d 00 f
e ff ff 0f 84 dc fe ff 
Oct 20 05:28:49 kattapa kernel: EIP: [<c036b595>]
dlm_drop_lockres_ref+0x1c5/0x280 SS:ESP 0068:f0473e60

Are the messages above able to tell what is wrong or are they just another
result of the problem?

best regards Henrik
-- 
NOTE: Dear Outlook users: Please remove me from your address books.
      Read this article and you know why:
      http://newsforge.com/article.pl?sid=03/08/21/143258



More information about the Ocfs2-users mailing list