[Ocfs2-users] kernel panic - not syncing

Tue Jan 23 06:16:37 PST 2007

I can reprodute it, every time on heavy IO 

I have read this FAQ:
I encounter "Kernel panic - not syncing: ocfs2 is very sorry to be fencing this system by panicing" whenever I run a heavy io load?

so, i have append on node "becks" , the string "elevator=deadline" on boot
node ocfs2, has the default rh IO scheduler 

This is my last panic on node becks:

 Index 19: took 0 ms to do bio add page read
Index 20: took 0 ms to do submit_bio for read
Index 21: took 36 ms to do  waiting for read completion
Index 22: took 0 ms to do bio alloc write
Index 23: took 0 ms to do bio add page write
Index 0: took 0 ms to do submit_bio for write
Index 1: took 0 ms to do checking slots
Index 2: took 1 ms to do waiting for write completion
Index 3: took 1962 ms to do msleep
Index 4: took 0 ms allocating bios for read
Index 5: took 0 ms to do bio alloc read
Index 6: took 0 ms to do bio add page read
Index 7: took 0 ms to do submit_bio for read
Index 8: took 9362 ms to do waiting for read completion
Index 9: took 0 ms to do bio alloc write
Index 10: took 0 ms to do add page write
Index 11: took 0 ms to do submit_bio for write
Index 12: took 0 ms to do checking slots
Index 13: took 48665 ms to do waiting for write completion
(3,0):02hb_stop_all_regions:1908 ERROR: stopping heartbeat on all active regions
.
Kernel panic - not syncing: ocfs2 is very sorry to be fencing this system by panicing

Other info:

[root at ocfs2 ~]# cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold
31
[root at becks ~]# cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold
31

[root at ocfs2 ~]# mount -t ocfs2
/dev/etherd/e2.0 on /ocfs2 type ocfs2 (rw,_netdev,heartbeat=local)
/dev/etherd/e3.0 on /ocfs2_nfs type ocfs2 (rw,_netdev,heartbeat=local)

[root at becks ~]# mount -t ocfs2
/dev/etherd/e2.0 on /ocfs2 type ocfs2 (rw,_netdev,heartbeat=local)
/dev/etherd/e3.0 on /ocfs2_nfs type ocfs2 (rw,_netdev,heartbeat=local)

[root at ocfs2 ~]# /etc/init.d/ocfs2 status
Active OCFS2 mountpoints:  /ocfs2 /ocfs2_nfs

[root at becks ~]# /etc/init.d/ocfs2 status
Active OCFS2 mountpoints:  /ocfs2 /ocfs2_nfs

[root at ocfs2 ~]# mounted.ocfs2 -f
Device                FS     Nodes
/dev/etherd/e2.0      ocfs2  ocfs2, becks
/dev/etherd/e3.0      ocfs2  ocfs2, becks

[root at becks ~]#  mounted.ocfs2 -f
Device                FS     Nodes
/dev/etherd/e3.0      ocfs2  ocfs2, becks
/dev/etherd/e2.0      ocfs2  ocfs2, becks

[root at ocfs2 ~]# mounted.ocfs2 -d
Device                FS     UUID                                  Label
/dev/etherd/e2.0      ocfs2  b24cc18d-af89-4980-a75e-a87530b1b878  seceti
/dev/etherd/e3.0      ocfs2  101a92fd-b83b-4294-8bfc-fbaa069c3239  nfs4

[root at becks ~]# mounted.ocfs2 -d
Device                FS     UUID                                  Label
/dev/etherd/e3.0      ocfs2  101a92fd-b83b-4294-8bfc-fbaa069c3239  nfs4
/dev/etherd/e2.0      ocfs2  b24cc18d-af89-4980-a75e-a87530b1b878  seceti

i can panic the nodes, also detaching the network cable... 

If you have any more debugging questions, feel free to ask me
thanks

-----Messaggio originale-----
Da: Srinivas Eeda [mailto:srinivas.eeda at oracle.com] 
Inviato: lunedì 22 gennaio 2007 18.30
A: Consulente3
Cc: ocfs2-users at oss.oracle.com
Oggetto: Re: [Ocfs2-users] kernel panic - not syncing

problem appears to be that IO is taking more time than effective O2CB_HEARTBEAT_THRESHOLD. Your configured value "31" doesn't seem to be effective?

Index 6: took 1995 ms to do msleepIndex 
Index 17: took 1996 ms to do msleep
Index 22: took 10001 ms to do waiting for read completion.

Can you please cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold and verify.