<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hi Sunil,<br>
Sorry for my late reply, i just had time today to start from scratch
and test.<br>
I rebuilt my environment(2 nodes connected to a SAN via
iSCSI+multipath). I still have the issue that the heartbeat is
active after I umount my ocfs2 volume. <br>
/etc/init.d/o2cb stop<br>
Stopping O2CB cluster CLUST: Failed<br>
Unable to stop cluster as heartbeat region still active<br>
<br>
ocfs2_hb_ctl -I -d /dev/mapper/volgr1-lvol0<br>
0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs<br>
<br>
After i manually kill the ref (ocfs2_hb_ctl -K -d
/dev/mapper/volgr1-lvol0 ocfs2
) i can stop successfully o2cb. I can live with that but why doesn't
it stop automatically? As i understand, hearbeat should be started
and stopped once the volume gets mounted/umounted.<br>
<br>
br,<br>
Laurentiu.<br>
<br>
On 10/19/2011 02:28, Sunil Mushran wrote:
<blockquote cite="mid:4E9E0B92.8060104@oracle.com" type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
Manual delete will only work if there are no references. In your
case<br>
there are references.<br>
<br>
You may want to start both nodes from scratch. Do not start/stop<br>
heartbeat manually. Also, do not force-format.<br>
<br>
On 10/18/2011 03:54 PM, Laurentiu Gosu wrote:
<blockquote cite="mid:4E9E03B7.4080603@easic.ro" type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
OK, i rebooted one of the nodes(both had similar issues); . But
something is still fishy.<br>
- i mounted the device: mount -t ocfs2 /dev/volgr1/lvol0
/mnt/tmp/<br>
- i unmount it: umount /mnt/tmp/<br>
- tried to stop o2cb: /etc/init.d/o2cb stop<br>
Stopping O2CB cluster CLUSTER: Failed<br>
Unable to stop cluster as heartbeat region still active<br>
- ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D<br>
0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs<br>
- ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D<br>
ocfs2_hb_ctl: File not found by ocfs2_lookup while stopping
heartbeat<br>
- ls -Rl /sys/kernel/config/cluster/CLUSTER/heartbeat/<br>
/sys/kernel/config/cluster/CLUSTER/heartbeat/:<br>
total 0<br>
drwxr-xr-x 2 root root 0 Oct 19 01:50
0C4AB55FE9314FA5A9F81652FDB9B22D<br>
-rw-r--r-- 1 root root 4096 Oct 19 01:40 dead_threshold<br>
<br>
/sys/kernel/config/cluster/CLUSTER/heartbeat/0C4AB55FE9314FA5A9F81652FDB9B22D:<br>
total 0<br>
-rw-r--r-- 1 root root 4096 Oct 19 01:50 block_bytes<br>
-rw-r--r-- 1 root root 4096 Oct 19 01:50 blocks<br>
-rw-r--r-- 1 root root 4096 Oct 19 01:50 dev<br>
-r--r--r-- 1 root root 4096 Oct 19 01:50 pid<br>
-rw-r--r-- 1 root root 4096 Oct 19 01:50 start_block<br>
<br>
- i cannot manually delete
/sys/kernel/config/cluster/CLUSTER/heartbeat/0C4AB55FE9314FA5A9F81652FDB9B22D/<br>
<br>
PS: i'm going to sleep now, i have to be up in a few hours. We
can continue tomorrow if it's ok with you. <br>
Thank you for your help.<br>
<br>
Laurentiu.<br>
<br>
On 10/19/2011 01:33, Sunil Mushran wrote:
<blockquote cite="mid:4E9DFEB0.9010206@oracle.com" type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
One way this can happen is if one starts the hb manually and
then force<br>
formats on that volume. The format will generate a new uuid.
Once that<br>
happens, the hb tool cannot map the region to the device and
thus fail<br>
to stop it. Right now the easiest option on this box is
resetting it.<br>
<br>
On 10/18/2011 03:24 PM, Laurentiu Gosu wrote:
<blockquote cite="mid:4E9DFC93.1050109@easic.ro" type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
Yes, i did reformat it(even more than once i think, last
week). This is a pre-production system and i'm trying
various options before moving into real life.<br>
<br>
<br>
On 10/19/2011 01:19, Sunil Mushran wrote:
<blockquote cite="mid:4E9DFB83.40603@oracle.com" type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
Did you reformat the volume recently? or, when did you
format last?<br>
<br>
On 10/18/2011 03:13 PM, Laurentiu Gosu wrote:
<blockquote cite="mid:4E9DFA03.8030405@easic.ro"
type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
well..this is weird<br>
ls /sys/kernel/config/cluster/CLUSTER/heartbeat/<br>
<b>918673F06F8F4ED188DDCE14F39945F6</b> dead_threshold<br>
<br>
looks like we have different UUIDs. Where is this coming
from??<br>
<br>
ocfs2_hb_ctl -I -u 918673F06F8F4ED188DDCE14F39945F6<br>
918673F06F8F4ED188DDCE14F39945F6: 1 refs<br>
<br>
<br>
On 10/19/2011 01:04, Sunil Mushran wrote:
<blockquote cite="mid:4E9DF7D0.7090404@oracle.com"
type="cite">Let's do it by hand. <br>
rm -rf /sys/kernel/config/cluster/.../heartbeat/<b>0C4AB55FE9314FA5A9F81652FDB9B22D
</b><br>
<br>
On 10/18/2011 02:52 PM, Laurentiu Gosu wrote: <br>
<blockquote type="cite"> ocfs2_hb_ctl -K -u
0C4AB55FE9314FA5A9F81652FDB9B22D <br>
ocfs2_hb_ctl: File not found by ocfs2_lookup while
stopping heartbeat <br>
<br>
No improvment :( <br>
<br>
<br>
On 10/19/2011 00:50, Sunil Mushran wrote: <br>
<blockquote type="cite">See if this cleans it up. <br>
ocfs2_hb_ctl -K -u
0C4AB55FE9314FA5A9F81652FDB9B22D <br>
<br>
On 10/18/2011 02:44 PM, Laurentiu Gosu wrote: <br>
<blockquote type="cite">ocfs2_hb_ctl -I -u
0C4AB55FE9314FA5A9F81652FDB9B22D <br>
0C4AB55FE9314FA5A9F81652FDB9B22D: 0 refs <br>
<br>
<br>
On 10/19/2011 00:43, Sunil Mushran wrote: <br>
<blockquote type="cite">ocfs2_hb_ctl -l -u
0C4AB55FE9314FA5A9F81652FDB9B22D <br>
<br>
On 10/18/2011 02:40 PM, Laurentiu Gosu wrote:
<br>
<blockquote type="cite">mounted.ocfs2 -d <br>
Device FS Stack
UUID Label <br>
/dev/mapper/volgr1-lvol0 ocfs2 o2cb
0C4AB55FE9314FA5A9F81652FDB9B22D ocfs2 <br>
<br>
mounted.ocfs2 -f <br>
Device FS Nodes <br>
/dev/mapper/volgr1-lvol0 ocfs2 ro02xsrv001
<br>
<br>
ro02xsrv001 = the other node in the cluster.
<br>
<br>
By the way, there is no /dev/md-2 <br>
ls /dev/dm-* <br>
/dev/dm-0 /dev/dm-1 <br>
<br>
<br>
On 10/19/2011 00:37, Sunil Mushran wrote: <br>
<blockquote type="cite">So it is not
mounted. But we still have a hb thread
because <br>
hb could not be stopped during umount. The
reason for that <br>
could be the same that causes ocfs2_hb_ctl
to fail. <br>
<br>
Do: <br>
mounted.ocfs2 -d <br>
<br>
On 10/18/2011 02:32 PM, Laurentiu Gosu
wrote: <br>
<blockquote type="cite">ls -lR
/sys/kernel/debug/ocfs2 <br>
/sys/kernel/debug/ocfs2: <br>
total 0 <br>
<br>
ls -lR /sys/kernel/debug/o2dlm <br>
/sys/kernel/debug/o2dlm: <br>
total 0 <br>
<br>
ocfs2_hb_ctl -I -d /dev/dm-2 <br>
ocfs2_hb_ctl: Device name specified was
not found while reading uuid <br>
<br>
There is no /dev/dm-2 mounted. <br>
<br>
<br>
On 10/19/2011 00:27, Sunil Mushran
wrote: <br>
<blockquote type="cite">mount -t debugfs
debugfs /sys/kernel/debug <br>
<br>
Then list that dir. <br>
<br>
Also, do: <br>
ocfs2_hb_ctl -l -d /dev/dm-2 <br>
<br>
Be careful before killing. We want to
be sure that dev is not mounted. <br>
<br>
On 10/18/2011 02:23 PM, Laurentiu Gosu
wrote: <br>
<blockquote type="cite">Again the
outputs: <br>
cat
/sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev<br>
dm-2 <br>
--->here should be volgr1-lvol0 i
guess? <br>
<br>
ls -lR /sys/kernel/debug/ocfs2 <br>
ls: /sys/kernel/debug/ocfs2: No such
file or directory <br>
<br>
ls -lR /sys/kernel/debug/o2dlm <br>
ls: /sys/kernel/debug/o2dlm: No such
file or directory <br>
<br>
I think i have to enable debug first
somehow..? <br>
<br>
Laurentiu. <br>
<br>
On 10/19/2011 00:17, Sunil Mushran
wrote: <br>
<blockquote type="cite">What does
this return? <br>
cat
/sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev<br>
<br>
Also, do: <br>
ls -lR /sys/kernel/debug/ocfs2 <br>
ls -lR /sys/kernel/debug/o2dlm <br>
<br>
On 10/18/2011 02:14 PM, Laurentiu
Gosu wrote: <br>
<blockquote type="cite">Here is
the output: <br>
<br>
ls -lR
/sys/kernel/config/cluster <br>
/sys/kernel/config/cluster: <br>
total 0 <br>
drwxr-xr-x 4 root root 0 Oct 19
00:12 CLUSTER <br>
<br>
/sys/kernel/config/cluster/CLUSTER:
<br>
total 0 <br>
-rw-r--r-- 1 root root 4096 Oct
19 00:12 fence_method <br>
drwxr-xr-x 3 root root 0 Oct
19 00:12 heartbeat <br>
-rw-r--r-- 1 root root 4096 Oct
19 00:12 idle_timeout_ms <br>
-rw-r--r-- 1 root root 4096 Oct
19 00:12 keepalive_delay_ms <br>
drwxr-xr-x 4 root root 0 Oct
11 20:23 node <br>
-rw-r--r-- 1 root root 4096 Oct
19 00:12 reconnect_delay_ms <br>
<br>
/sys/kernel/config/cluster/CLUSTER/heartbeat:
<br>
total 0 <br>
drwxr-xr-x 2 root root 0 Oct
19 00:12
918673F06F8F4ED188DDCE14F39945F6
<br>
-rw-r--r-- 1 root root 4096 Oct
19 00:12 dead_threshold <br>
<br>
/sys/kernel/config/cluster/CLUSTER/heartbeat/<b>918673F06F8F4ED188DDCE14F39945F6</b>:
<br>
total 0 <br>
-rw-r--r-- 1 root root 4096 Oct
19 00:12 block_bytes <br>
-rw-r--r-- 1 root root 4096 Oct
19 00:12 blocks <br>
-rw-r--r-- 1 root root 4096 Oct
19 00:12 dev <br>
-r--r--r-- 1 root root 4096 Oct
19 00:12 pid <br>
-rw-r--r-- 1 root root 4096 Oct
19 00:12 start_block <br>
<br>
/sys/kernel/config/cluster/CLUSTER/node:
<br>
total 0 <br>
drwxr-xr-x 2 root root 0 Oct 19
00:12 ro02xsrv001 <br>
drwxr-xr-x 2 root root 0 Oct 19
00:12 ro02xsrv002 <br>
<br>
/sys/kernel/config/cluster/CLUSTER/node/ro02xsrv001:
<br>
total 0 <br>
-rw-r--r-- 1 root root 4096 Oct
19 00:12 ipv4_address <br>
-rw-r--r-- 1 root root 4096 Oct
19 00:12 ipv4_port <br>
-rw-r--r-- 1 root root 4096 Oct
19 00:12 local <br>
-rw-r--r-- 1 root root 4096 Oct
19 00:12 num <br>
<br>
/sys/kernel/config/cluster/CLUSTER/node/ro02xsrv002:
<br>
total 0 <br>
-rw-r--r-- 1 root root 4096 Oct
19 00:12 ipv4_address <br>
-rw-r--r-- 1 root root 4096 Oct
19 00:12 ipv4_port <br>
-rw-r--r-- 1 root root 4096 Oct
19 00:12 local <br>
-rw-r--r-- 1 root root 4096 Oct
19 00:12 num <br>
<br>
<br>
<br>
<br>
On 10/19/2011 00:12, Sunil
Mushran wrote: <br>
<blockquote type="cite">ls -lR
/sys/kernel/config/cluster <br>
<br>
What does this return? <br>
<br>
On 10/18/2011 02:05 PM,
Laurentiu Gosu wrote: <br>
<blockquote type="cite">Hi, <br>
I have a 2 nodes ocfs2
cluster running UEK
2.6.32-100.0.19.el5, <br>
ocfs2console-1.6.3-2.el5,
ocfs2-tools-1.6.3-2.el5. <br>
My problem is that all the
time when i try to run
/etc/init.d/o2cb stop <br>
it fails with this error: <br>
Stopping O2CB cluster
CLUSTER: Failed <br>
Unable to stop cluster
as heartbeat region still
active <br>
There is no active mount
point. I tried to manually
stop the heartdbeat <br>
with "ocfs2_hb_ctl -K -d
/dev/mapper/volgr1-lvol0
ocfs2" (after finding <br>
the refs number with
"ocfs2_hb_ctl -I -d
/dev/mapper/volgr1-lvol0 ").
<br>
But even if refs number is
set to zero the "heartbeat
region still <br>
active" occurs. <br>
How can i fix this? <br>
<br>
Thank you in advance. <br>
Laurentiu. <br>
<br>
<br>
_______________________________________________
<br>
Ocfs2-users mailing list <br>
<a moz-do-not-send="true"
class="moz-txt-link-abbreviated"
href="mailto:Ocfs2-users@oss.oracle.com">Ocfs2-users@oss.oracle.com</a>
<br>
<a moz-do-not-send="true"
class="moz-txt-link-freetext"
href="http://oss.oracle.com/mailman/listinfo/ocfs2-users">http://oss.oracle.com/mailman/listinfo/ocfs2-users</a>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</body>
</html>