[Ocfs2-users] "another node is heartbeating in our slot"

Florin Andrei florin at andrei.myip.org
Wed Sep 23 11:46:27 PDT 2009


The underlying SAN LUN was temporarily attached to serv3, and was 
mounted. The question is - what are the recommended steps to repair the 
filesystem?

Sunil Mushran wrote:
> You cannot share a device between two different clusters.
> 
> Florin Andrei wrote:
>> OCFS2 cluster, two nodes, nothing fancy:
>>
>> #####################################
>> [root at serv1 ~]# cat /etc/ocfs2/cluster.conf
>> node:
>>          ip_port = 7777
>>          ip_address = 10.10.20.64
>>          number = 0
>>          name = serv1.foobar
>>          cluster = ocfs2
>>
>> node:
>>          ip_port = 7777
>>          ip_address = 10.10.20.65
>>          number = 1
>>          name = serv2.foobar
>>          cluster = ocfs2
>>
>> cluster:
>>          node_count = 2
>>          name = ocfs2
>> #####################################
>>
>> A filesystem shared by these two machines got mounted on a 3rd machine, 
>> which is part of another cluster, and the 3rd machine happens to share 
>> the same node number with serv2.
>> Some files were deleted on the 3rd machine, then the fs was unmounted 
>> from it (but remained mounted on 1 and 2).
>> As a result, a bunch of messages like this appeared in the logs:
>>
>> serv2 kernel: (21146,1):o2hb_do_disk_heartbeat:982 ERROR: Device "dm-3": 
>> another node is heartbeating in our slot!
>>
>> And now there's a discrepancy between the disk usage indicated by df 
>> (it's pretty high) and du (it's much lower). Also, ls -l generates weird 
>> output for some files (which were supposedly deleted on the 3rd machine):
>>
>> ?--------- ? ?        ?              ?            ? access_log.20090601
>> ?--------- ? ?        ?              ?            ? access_log.20090602
>> ?--------- ? ?        ?              ?            ? access_log.20090603
>> ?--------- ? ?        ?              ?            ? access_log.20090604
>>
>> I unmounted the fs on serv2 then mounted it back, but that didn't help. 
>> Didn't try to unmount serv1 yet.
>>
>> Any suggestions?
>>
>>   
> 
> 
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users


-- 
Florin Andrei

http://florin.myip.org/




More information about the Ocfs2-users mailing list