[Ocfs2-users] "another node is heartbeating in our slot"

Sunil Mushran sunil.mushran at oracle.com
Wed Sep 23 11:59:10 PDT 2009


umount the fs on all nodes and run "fsck.ocfs2 -f".

Florin Andrei wrote:
> The underlying SAN LUN was temporarily attached to serv3, and was 
> mounted. The question is - what are the recommended steps to repair the 
> filesystem?
>
> Sunil Mushran wrote:
>   
>> You cannot share a device between two different clusters.
>>
>> Florin Andrei wrote:
>>     
>>> OCFS2 cluster, two nodes, nothing fancy:
>>>
>>> #####################################
>>> [root at serv1 ~]# cat /etc/ocfs2/cluster.conf
>>> node:
>>>          ip_port = 7777
>>>          ip_address = 10.10.20.64
>>>          number = 0
>>>          name = serv1.foobar
>>>          cluster = ocfs2
>>>
>>> node:
>>>          ip_port = 7777
>>>          ip_address = 10.10.20.65
>>>          number = 1
>>>          name = serv2.foobar
>>>          cluster = ocfs2
>>>
>>> cluster:
>>>          node_count = 2
>>>          name = ocfs2
>>> #####################################
>>>
>>> A filesystem shared by these two machines got mounted on a 3rd machine, 
>>> which is part of another cluster, and the 3rd machine happens to share 
>>> the same node number with serv2.
>>> Some files were deleted on the 3rd machine, then the fs was unmounted 
>>> from it (but remained mounted on 1 and 2).
>>> As a result, a bunch of messages like this appeared in the logs:
>>>
>>> serv2 kernel: (21146,1):o2hb_do_disk_heartbeat:982 ERROR: Device "dm-3": 
>>> another node is heartbeating in our slot!
>>>
>>> And now there's a discrepancy between the disk usage indicated by df 
>>> (it's pretty high) and du (it's much lower). Also, ls -l generates weird 
>>> output for some files (which were supposedly deleted on the 3rd machine):
>>>
>>> ?--------- ? ?        ?              ?            ? access_log.20090601
>>> ?--------- ? ?        ?              ?            ? access_log.20090602
>>> ?--------- ? ?        ?              ?            ? access_log.20090603
>>> ?--------- ? ?        ?              ?            ? access_log.20090604
>>>
>>> I unmounted the fs on serv2 then mounted it back, but that didn't help. 
>>> Didn't try to unmount serv1 yet.
>>>
>>> Any suggestions?
>>>
>>>   
>>>       
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>     
>
>
>   




More information about the Ocfs2-users mailing list