[Ocfs2-users] Another node is heartbeating in our slot! errors with LUN removal/addition
Sunil Mushran
sunil.mushran at oracle.com
Mon Dec 1 12:40:32 PST 2008
So the problem you are encountering is killing via uuid. You could kill by
device name too.
By now you have the list of heartbeat regions. To get the device name for
a region, do:
$ cat
/sys/kernel/config/cluster/CLUSERNAME/heartbeat/C43CB881C2C84B09BAC14546BF6DCAD9/dev
sdf1
$ ocfs2_hb_ctl -K -d /dev/sdf1
Now makesure that that device is not mounted. It should not be. If it
is, then you probably have used force-uuid-reset to change the uuid of
an active
device. In that case, I see no solution other than a node reset.
But before you do this, I would like some more info.
1. strace -o /tmp/hbctl.out ocfs2_hb_ctl -K -u
F5F0522D39FC4EB2824C3E68C0B1D589
2. uname -a
3. rpm -qa | grep ocfs2
4. rpm -qf `which ocfs2_hb_ctl`
5. mounted.ocfs2 -d >/tmp/mounted.out
Thanks
Sunil
Daniel Keisling wrote:
> I wrote a script to easily get the heartbeats that should have been
> killed. However, I get a segmentation fault everytime I try and kill
> the "dead" heartbeats:
>
> [root at ausracdbd01 tmp]# mounted.ocfs2 -d | grep -i f5f0 | wc -l
> 0
>
> [root at ausracdbd01 tmp]# ocfs2_hb_ctl -K -u
> F5F0522D39FC4EB2824C3E68C0B1D589
> Segmentation fault (core dumped)
>
>
>
> The process is still active:
>
> [root at ausracdbd01 tmp]# ps -ef | grep -i f5f0
> root 620 169 0 Nov29 ? 00:00:30 [o2hb-F5F0522D39]
> root 22608 18491 0 14:07 pts/4 00:00:00 grep -i f5f0
>
> Attached is the core.
>
> While I can create and mount snapshot filesystems on my development
> node, a dead heartbeat on one of my production nodes is not letting me
> mount the snapshot for a newly presented filesystem (thus causing our
> backups to fail). What else can I do? I really don't want to open an
> SR with Oracle...
>
> Thanks,
>
> Daniel
More information about the Ocfs2-users
mailing list