[Ocfs2-users] kernel panic - not syncing
Sunil Mushran
Sunil.Mushran at oracle.com
Tue Jan 23 11:54:13 PST 2007
o2hb is timing out because the io to the device is taking too much time.
Not much one can do other than increase the time out. Say 2mins.
O2CB_HEARTBEAT_THRESHOLD = 61
Consulente3 wrote:
> I can reprodute it, every time on heavy IO
>
> I have read this FAQ:
> I encounter "Kernel panic - not syncing: ocfs2 is very sorry to be fencing this system by panicing" whenever I run a heavy io load?
>
> so, i have append on node "becks" , the string "elevator=deadline" on boot
> node ocfs2, has the default rh IO scheduler
>
> This is my last panic on node becks:
>
> Index 19: took 0 ms to do bio add page read
> Index 20: took 0 ms to do submit_bio for read
> Index 21: took 36 ms to do waiting for read completion
> Index 22: took 0 ms to do bio alloc write
> Index 23: took 0 ms to do bio add page write
> Index 0: took 0 ms to do submit_bio for write
> Index 1: took 0 ms to do checking slots
> Index 2: took 1 ms to do waiting for write completion
> Index 3: took 1962 ms to do msleep
> Index 4: took 0 ms allocating bios for read
> Index 5: took 0 ms to do bio alloc read
> Index 6: took 0 ms to do bio add page read
> Index 7: took 0 ms to do submit_bio for read
> Index 8: took 9362 ms to do waiting for read completion
> Index 9: took 0 ms to do bio alloc write
> Index 10: took 0 ms to do add page write
> Index 11: took 0 ms to do submit_bio for write
> Index 12: took 0 ms to do checking slots
> Index 13: took 48665 ms to do waiting for write completion
> (3,0):02hb_stop_all_regions:1908 ERROR: stopping heartbeat on all active regions
> .
> Kernel panic - not syncing: ocfs2 is very sorry to be fencing this system by panicing
>
> Other info:
>
> [root at ocfs2 ~]# cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold
> 31
> [root at becks ~]# cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold
> 31
>
> [root at ocfs2 ~]# mount -t ocfs2
> /dev/etherd/e2.0 on /ocfs2 type ocfs2 (rw,_netdev,heartbeat=local)
> /dev/etherd/e3.0 on /ocfs2_nfs type ocfs2 (rw,_netdev,heartbeat=local)
>
> [root at becks ~]# mount -t ocfs2
> /dev/etherd/e2.0 on /ocfs2 type ocfs2 (rw,_netdev,heartbeat=local)
> /dev/etherd/e3.0 on /ocfs2_nfs type ocfs2 (rw,_netdev,heartbeat=local)
>
> [root at ocfs2 ~]# /etc/init.d/ocfs2 status
> Active OCFS2 mountpoints: /ocfs2 /ocfs2_nfs
>
> [root at becks ~]# /etc/init.d/ocfs2 status
> Active OCFS2 mountpoints: /ocfs2 /ocfs2_nfs
>
> [root at ocfs2 ~]# mounted.ocfs2 -f
> Device FS Nodes
> /dev/etherd/e2.0 ocfs2 ocfs2, becks
> /dev/etherd/e3.0 ocfs2 ocfs2, becks
>
> [root at becks ~]# mounted.ocfs2 -f
> Device FS Nodes
> /dev/etherd/e3.0 ocfs2 ocfs2, becks
> /dev/etherd/e2.0 ocfs2 ocfs2, becks
>
> [root at ocfs2 ~]# mounted.ocfs2 -d
> Device FS UUID Label
> /dev/etherd/e2.0 ocfs2 b24cc18d-af89-4980-a75e-a87530b1b878 seceti
> /dev/etherd/e3.0 ocfs2 101a92fd-b83b-4294-8bfc-fbaa069c3239 nfs4
>
> [root at becks ~]# mounted.ocfs2 -d
> Device FS UUID Label
> /dev/etherd/e3.0 ocfs2 101a92fd-b83b-4294-8bfc-fbaa069c3239 nfs4
> /dev/etherd/e2.0 ocfs2 b24cc18d-af89-4980-a75e-a87530b1b878 seceti
>
> i can panic the nodes, also detaching the network cable...
>
> If you have any more debugging questions, feel free to ask me
> thanks
>
> -----Messaggio originale-----
> Da: Srinivas Eeda [mailto:srinivas.eeda at oracle.com]
> Inviato: lunedì 22 gennaio 2007 18.30
> A: Consulente3
> Cc: ocfs2-users at oss.oracle.com
> Oggetto: Re: [Ocfs2-users] kernel panic - not syncing
>
> problem appears to be that IO is taking more time than effective O2CB_HEARTBEAT_THRESHOLD. Your configured value "31" doesn't seem to be effective?
>
> Index 6: took 1995 ms to do msleepIndex
> Index 17: took 1996 ms to do msleep
> Index 22: took 10001 ms to do waiting for read completion.
>
> Can you please cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold and verify.
>
>
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>
More information about the Ocfs2-users
mailing list