[Ocfs2-users] "another node is heartbeating in our slot"

Florin Andrei florin at andrei.myip.org
Wed Sep 23 11:31:38 PDT 2009


OCFS2 cluster, two nodes, nothing fancy:

#####################################
[root at serv1 ~]# cat /etc/ocfs2/cluster.conf
node:
         ip_port = 7777
         ip_address = 10.10.20.64
         number = 0
         name = serv1.foobar
         cluster = ocfs2

node:
         ip_port = 7777
         ip_address = 10.10.20.65
         number = 1
         name = serv2.foobar
         cluster = ocfs2

cluster:
         node_count = 2
         name = ocfs2
#####################################

A filesystem shared by these two machines got mounted on a 3rd machine, 
which is part of another cluster, and the 3rd machine happens to share 
the same node number with serv2.
Some files were deleted on the 3rd machine, then the fs was unmounted 
from it (but remained mounted on 1 and 2).
As a result, a bunch of messages like this appeared in the logs:

serv2 kernel: (21146,1):o2hb_do_disk_heartbeat:982 ERROR: Device "dm-3": 
another node is heartbeating in our slot!

And now there's a discrepancy between the disk usage indicated by df 
(it's pretty high) and du (it's much lower). Also, ls -l generates weird 
output for some files (which were supposedly deleted on the 3rd machine):

?--------- ? ?        ?              ?            ? access_log.20090601
?--------- ? ?        ?              ?            ? access_log.20090602
?--------- ? ?        ?              ?            ? access_log.20090603
?--------- ? ?        ?              ?            ? access_log.20090604

I unmounted the fs on serv2 then mounted it back, but that didn't help. 
Didn't try to unmount serv1 yet.

Any suggestions?

-- 
Florin Andrei

http://florin.myip.org/




More information about the Ocfs2-users mailing list