[Ocfs2-users] Another node is heartbeating in our slot! errors with LUN removal/addition

Sunil Mushran sunil.mushran at oracle.com
Fri Oct 24 12:45:37 PDT 2008


So that's the problem. The heartbeat is not stopping because of the
segfault. I reviewed the code change in this tool (1.2.7 to 1.4.1)
and it is quite limited. As in, I have no idea as to why it is
segfaulting.

Now you could stop the heartbeat manually. You have to be careful
though because stopping it for a mounted volume will be very
problematic to say the least. But one good reason to do it manually
would be to catch the coredump.... which could help tell us what the
problem is.

If you are up for it, do:
$ ulimit -c unlimited
$ ocfs2_hb_ctl -K -d /dev/sdX o2cb

Run it on a umounted volume that has a reference left over. -I is
to view the number of references.

Sunil

Daniel Keisling wrote:
> Oct 23 08:53:21 ausracdb03 kernel: (2410,3):o2hb_do_disk_heartbeat:770
> ERROR: Device "dm-28": another node is heartbeating in our slot!
>
> [root at ausracdb03 ~]# ocfs2_hb_ctl -I -d /dev/dm-28
> 289FD533334645C5A88FD715FC0EEF85: 1 refs
>   
> Yes, it segfaults every night (I take two snapshots per night):
>
> [root at ausracdb03 log]# grep segfault /var/log/messages
> Oct 21 03:15:47 ausracdb03 kernel: ocfs2_hb_ctl[4197]: segfault at
> 0000000000000000 rip 0000000000428fa0 rsp 00007fffefd623e8 error 4
> Oct 21 03:17:43 ausracdb03 kernel: ocfs2_hb_ctl[8002]: segfault at
> 0000000000000000 rip 0000000000428fa0 rsp 00007fff1a8f9318 error 4
> Oct 21 16:43:30 ausracdb03 kernel: ocfs2_hb_ctl[16933]: segfault at
> 0000000000000000 rip 0000000000428fa0 rsp 00007fff816aa558 error 4
> Oct 21 16:43:31 ausracdb03 kernel: ocfs2_hb_ctl[16950]: segfault at
> 0000000000000000 rip 0000000000428fa0 rsp 00007fffcb162b88 error 4
> Oct 22 03:15:44 ausracdb03 kernel: ocfs2_hb_ctl[7721]: segfault at
> 0000000000000000 rip 0000000000428fa0 rsp 00007fff88a7efb8 error 4
> Oct 22 03:17:46 ausracdb03 kernel: ocfs2_hb_ctl[11294]: segfault at
> 0000000000000000 rip 0000000000428fa0 rsp 00007fff85549f68 error 4
> Oct 23 03:15:51 ausracdb03 kernel: ocfs2_hb_ctl[32555]: segfault at
> 0000000000000000 rip 0000000000428fa0 rsp 00007fff8fefe498 error 4
> Oct 23 03:17:40 ausracdb03 kernel: ocfs2_hb_ctl[3756]: segfault at
> 0000000000000000 rip 0000000000428fa0 rsp 00007fff99bb25d8 error 4
> Oct 24 03:15:47 ausracdb03 kernel: ocfs2_hb_ctl[15664]: segfault at
> 0000000000000000 rip 0000000000428fa0 rsp 00007ffff4254aa8 error 4
> Oct 24 03:17:43 ausracdb03 kernel: ocfs2_hb_ctl[18029]: segfault at
> 0000000000000000 rip 0000000000428fa0 rsp 00007fff75055a78 error 4
>
>
> This began when I upgraded to v1.4.1-1 from v1.2.8.
>
> Thanks,
>
> Daniel
>   




More information about the Ocfs2-users mailing list