<html><head><style type='text/css'>p { margin: 0; }</style><style type='text/css'>body { font-family: 'Arial'; font-size: 10pt; color: #000000}</style></head><body>Hi,<br><br>I've seen a number of people with this problem (me too!) but nobody seems to have a solution, any help would be greatly appreciated.<br><br>Two noded work file with DRBD/OCFS2, but when I load a third using GNBD, I seem to run into problems...<br><br>I'm running an RH 2.6.21 kernel with Xen 3.2. - OCFS version 1.3.3 - Tools 1.2.4.<br><br>I have two nodes with the following config;<br><br><span style="font-family: Courier,Courier New,mono;">node:</span><br style="font-family: Courier,Courier New,mono;"><span style="font-family: Courier,Courier New,mono;"> ip_port = 7777</span><br style="font-family: Courier,Courier New,mono;"><span style="font-family: Courier,Courier New,mono;"> ip_address = 10.0.0.1</span><br style="font-family: Courier,Courier New,mono;"><span style="font-family: Courier,Courier New,mono;"> number = 0</span><br style="font-family: Courier,Courier New,mono;"><span style="font-family: Courier,Courier New,mono;"> name = nodea</span><br style="font-family: Courier,Courier New,mono;"><span style="font-family: Courier,Courier New,mono;"> cluster = ocfs2</span><br style="font-family: Courier,Courier New,mono;"><br style="font-family: Courier,Courier New,mono;"><span style="font-family: Courier,Courier New,mono;">node:</span><br style="font-family: Courier,Courier New,mono;"><span style="font-family: Courier,Courier New,mono;"> ip_port = 7777</span><br style="font-family: Courier,Courier New,mono;"><span style="font-family: Courier,Courier New,mono;"> ip_address = 10.0.0.2</span><br style="font-family: Courier,Courier New,mono;"><span style="font-family: Courier,Courier New,mono;"> number = 1</span><br style="font-family: Courier,Courier New,mono;"><span style="font-family: Courier,Courier New,mono;"> name = nodeb</span><br style="font-family: Courier,Courier New,mono;"><span style="font-family: Courier,Courier New,mono;"> cluster = ocfs2</span><br style="font-family: Courier,Courier New,mono;"><br style="font-family: Courier,Courier New,mono;"><span style="font-family: Courier,Courier New,mono;">node:</span><br style="font-family: Courier,Courier New,mono;"><span style="font-family: Courier,Courier New,mono;"> ip_port = 7777</span><br style="font-family: Courier,Courier New,mono;"><span style="font-family: Courier,Courier New,mono;"> ip_address = 10.0.0.20</span><br style="font-family: Courier,Courier New,mono;"><span style="font-family: Courier,Courier New,mono;"> number = 3</span><br style="font-family: Courier,Courier New,mono;"><span style="font-family: Courier,Courier New,mono;"> name = mgm</span><br style="font-family: Courier,Courier New,mono;"><span style="font-family: Courier,Courier New,mono;"> cluster = ocfs2</span><br style="font-family: Courier,Courier New,mono;"><br style="font-family: Courier,Courier New,mono;"><span style="font-family: Courier,Courier New,mono;">cluster:</span><br style="font-family: Courier,Courier New,mono;"><span style="font-family: Courier,Courier New,mono;"> node_count = 3</span><br style="font-family: Courier,Courier New,mono;"><span style="font-family: Courier,Courier New,mono;"> name = ocfs2</span><br style="font-family: Courier,Courier New,mono;"><br>nodea is running a 400G filesystem on /drdb1<br>nodeb is running a 400G filesystem on /drdb2 (mirroring drbd1 using drbd 8)<br><br>I can load nodes a and b and things look fine and work no problem, both systems can mount their respective drbd's and it all seems to work.<br><br>I then run gnbd_serv on both machines and export the drbd devices.<br><br>On booting "mgm", I load drbd-client, then /etc/init.d/o2cb, so far so good;<br><br><span style="font-family: Courier,Courier New,mono;">root@mgm:~# /etc/init.d/o2cb status</span><br style="font-family: Courier,Courier New,mono;"><span style="font-family: Courier,Courier New,mono;">Module "configfs": Loaded</span><br style="font-family: Courier,Courier New,mono;"><span style="font-family: Courier,Courier New,mono;">Filesystem "configfs": Mounted</span><br style="font-family: Courier,Courier New,mono;"><span style="font-family: Courier,Courier New,mono;">Module "ocfs2_nodemanager": Loaded</span><br style="font-family: Courier,Courier New,mono;"><span style="font-family: Courier,Courier New,mono;">Module "ocfs2_dlm": Loaded</span><br style="font-family: Courier,Courier New,mono;"><span style="font-family: Courier,Courier New,mono;">Module "ocfs2_dlmfs": Loaded</span><br style="font-family: Courier,Courier New,mono;"><span style="font-family: Courier,Courier New,mono;">Filesystem "ocfs2_dlmfs": Mounted</span><br style="font-family: Courier,Courier New,mono;"><span style="font-family: Courier,Courier New,mono;">Checking O2CB cluster ocfs2: Online</span><br style="font-family: Courier,Courier New,mono;"><span style="font-family: Courier,Courier New,mono;">Heartbeat dead threshold = 7</span><br style="font-family: Courier,Courier New,mono;"><span style="font-family: Courier,Courier New,mono;"> Network idle timeout: 10000</span><br style="font-family: Courier,Courier New,mono;"><span style="font-family: Courier,Courier New,mono;"> Network keepalive delay: 5000</span><br style="font-family: Courier,Courier New,mono;"><span style="font-family: Courier,Courier New,mono;"> Network reconnect delay: 2000</span><br style="font-family: Courier,Courier New,mono;"><span style="font-family: Courier,Courier New,mono;">Checking O2CB heartbeat: Not active<br><br>root@mgm:~# mounted.ocfs2 -f<br>Device FS Nodes<br>/dev/gnbd0 ocfs2 nodea, nodeb<br>/dev/gnbd1 ocfs2 nodea, nodeb<br><br>root@mgm:~# mounted.ocfs2 -d<br>Device FS UUID Label<br>/dev/gnbd0 ocfs2 35fff639-0ec2-4a8d-8849-2b9ef078a40a brick<br>/dev/gnbd1 ocfs2 35fff639-0ec2-4a8d-8849-2b9ef078a40a brick<br><br>Slots;<br> Slot# Node#<br> 0 0<br> 1 1<br><br> Slot# Node#<br> 0 0<br> 1 1<br><br><span style="font-family: Verdana,Arial,Helvetica,sans-serif;">Now .. I come to try and mount a device on host "mgm";<br><br>mount -t ocfs2 /dev/gnbd0 /cluster<br><br>In the kernel log on nodea I see;<br><span style="font-family: Verdana,Arial,Helvetica,sans-serif;">Feb 9 17:37:01 nodea kernel: (3576,0):o2hb_do_disk_heartbeat:767 ERROR: Device "drbd1": another node is heartbeating in our slot!</span><br style="font-family: Verdana,Arial,Helvetica,sans-serif;"><span style="font-family: Verdana,Arial,Helvetica,sans-serif;">Feb 9 17:37:03 nodea kernel: (3576,0):o2hb_do_disk_heartbeat:767 ERROR: Device "drbd1": another node is heartbeating in our slot!</span><br style="font-family: Verdana,Arial,Helvetica,sans-serif;"><br>On nodeb I see;<br>Feb 9 17:37:00 nodeb kernel: (3515,0):o2hb_do_disk_heartbeat:767 ERROR: Device "drbd2": another node is heartbeating in our slot!<br>Feb 9 17:37:02 nodeb kernel: (3515,0):o2hb_do_disk_heartbeat:767 ERROR: Device "drbd2": another node is heartbeating in our slot!<br><br>And within 10 seconds or so both machines fence themselves off and reboot.<br><br></span></span><span style="font-family: Verdana,Arial,Helvetica,sans-serif;">It "seems" as tho' mgm is not recognising that slots 0 and 1 are already taken .. but everything "look" Ok to me.</span><br style="font-family: Verdana,Arial,Helvetica,sans-serif;"><span style="font-family: Verdana,Arial,Helvetica,sans-serif;">Can anyone spot any glaring mistakes or suggest a way I can debug this or provide more information to the list?</span><br><br>Many thanks,<br>Gareth.<br></body></html>