<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Well, the device exists in /proc/partitions:<br>
### cat /proc/partitions |grep dm-2<br>
253 2 11607154688 dm-2<br>
### ll /dev/mapper/volgr1-lvol0<br>
brw-rw---- 1 root disk 253, 2 Dec 11 14:14 /dev/mapper/volgr1-lvol0<br>
<br>
I do not have any weird config, just a stripped lvm
volume(/dev/mapper/volgr1-lvol0) created out of 2 multipath
devices(/dev/mpath/mpathz & /dev/mpath/mpathy) which are made
available by iSCSI(/dev/sdX...).<br>
<br>
Anyway, I think i can live with that(i create the symlink at boot
time from rc.local).<br>
When is 1.8 supposed to go out?<br>
And a side question: is there any nagios plugin available to monitor
cluster status? I could not find any.<br>
br,<br>
Laurentiu.<br>
<br>
<br>
On 12/12/2011 21:02, Sunil Mushran wrote:
<blockquote cite="mid:4EE64FBC.1070108@oracle.com" type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
Thanks. Yes, stop hb looks up for the device in /proc/partitions.
I guess the<br>
utility is expecting the partitions there because that's how udev
works normally.<br>
<br>
Having said that, I think we have made a change in 1.8 whereby
stop hb does<br>
not scan the devices but just looks up configfs.<br>
<br>
On 12/11/2011 08:14 AM, Laurentiu Gosu wrote:
<blockquote cite="mid:4EE4D6D2.3050404@easic.ro" type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
<br>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
Hi Sunil,<br>
Maybe you remember the bellow thread. Shortly the pb was that
heartbeat region was still active after umounting the ocfs
volume(i use latest UEK + ocfs2-tools).<br>
Based on this link <a moz-do-not-send="true"
href="http://markmail.org/message/7h7r32avuitqdhzr#query:+page:1+mid:lq7arecz2dui6b3v+state:results">http://markmail.org/message/7h7r32avuitqdhzr#query:+page:1+mid:lq7arecz2dui6b3v+state:results</a>
i manually created /dev/dm-2 symlink to point to my SAN device
[/dev/mapper/volgr1-lvol0] and the hearbeat was stopped
normally. Maybe it helps you find the real issue. As i
understand that symlink should be automatically created but it
seems the pb is still there in ocfs2-tools-1.6.3-2.el5.<br>
<br>
br,<br>
laurentiu.<br>
<br>
On 10/24/2011 23:54, Sunil Mushran wrote:
<blockquote cite="mid:4EA5D087.40500@oracle.com" type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
Well, I wouldn't advice you to go into prod with this problem.<br>
To figure out the issue, we'll need to provide a debug version
of<br>
ocfs2_hb_ctl.<br>
<br>
If you have support, ping oracle support and ask for
assistance.<br>
<br>
If not, download the source and run ocfs2_hb_ctl in gdb. The
problem<br>
is in the code path that begins in the function lookup_dev().<br>
<br>
On 10/23/2011 01:30 PM, Laurentiu Gosu wrote:
<blockquote cite="mid:4EA4795F.6050408@easic.ro" type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
#rpm -qa |grep ocfs2<br>
ocfs2console-1.6.3-2.el5<br>
ocfs2-tools-1.6.3-2.el5<br>
<br>
Just let me know if I can give more details to find the
problem. I will move ocfs2 into production in the next
weeks.<br>
<br>
<br>
On 10/23/2011 22:49, Sunil Mushran wrote:
<blockquote cite="mid:4EA46FC0.3090505@oracle.com"
type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
<title></title>
Are you sure you have ocfs2-tools-1.6.3? I remember we had
an<br>
issue with this with an earlier release... 1.6.1/.2.<br>
<br>
On 10/23/2011 10:43 AM, Laurentiu Gosu wrote:
<blockquote cite="mid:4EA45258.2030309@easic.ro"
type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
hmm..<br>
#ocfs2_hb_ctl -I -u 0C4AB55FE9314FA5A9F81652FDB9B22D<br>
0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs<br>
<b>BUT:</b><br>
#ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
ocfs2<br>
ocfs2_hb_ctl: File not found by ocfs2_lookup while
stopping heartbeat<br>
I can still kill the ref using device name (-d).<br>
<br>
On 10/23/2011 17:57, Sunil Mushran wrote:
<blockquote cite="mid:4EA42B41.9070607@oracle.com"
type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
<title></title>
I think it stops by uuid. So try doing this the next
time.<br>
You are encountering some issue that we have not seen
before.<br>
ocfs2_hb_ctl -K -u 0C4AB55FE9314FA5A9F81652FDB9B22D
ocfs2<br>
<br>
On 10/23/2011 05:32 AM, Laurentiu Gosu wrote:
<blockquote cite="mid:4EA40944.8080106@easic.ro"
type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
Hi Sunil,<br>
Sorry for my late reply, i just had time today to
start from scratch and test.<br>
I rebuilt my environment(2 nodes connected to a SAN
via iSCSI+multipath). I still have the issue that
the heartbeat is active after I umount my ocfs2
volume. <br>
/etc/init.d/o2cb stop<br>
Stopping O2CB cluster CLUST: Failed<br>
Unable to stop cluster as heartbeat region still
active<br>
<br>
ocfs2_hb_ctl -I -d /dev/mapper/volgr1-lvol0<br>
0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs<br>
<br>
After i manually kill the ref (ocfs2_hb_ctl -K -d
/dev/mapper/volgr1-lvol0 ocfs2 ) i can stop
successfully o2cb. I can live with that but why
doesn't it stop automatically? As i understand,
hearbeat should be started and stopped once the
volume gets mounted/umounted.<br>
<br>
br,<br>
Laurentiu.<br>
<br>
On 10/19/2011 02:28, Sunil Mushran wrote:
<blockquote cite="mid:4E9E0B92.8060104@oracle.com"
type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
Manual delete will only work if there are no
references. In your case<br>
there are references.<br>
<br>
You may want to start both nodes from scratch. Do
not start/stop<br>
heartbeat manually. Also, do not force-format.<br>
<br>
On 10/18/2011 03:54 PM, Laurentiu Gosu wrote:
<blockquote cite="mid:4E9E03B7.4080603@easic.ro"
type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
OK, i rebooted one of the nodes(both had similar
issues); . But something is still fishy.<br>
- i mounted the device: mount -t ocfs2
/dev/volgr1/lvol0 /mnt/tmp/<br>
- i unmount it: umount /mnt/tmp/<br>
- tried to stop o2cb: /etc/init.d/o2cb stop<br>
Stopping O2CB cluster CLUSTER: Failed<br>
Unable to stop cluster as heartbeat region still
active<br>
- ocfs2_hb_ctl -I -u
0C4AB55FE9314FA5A9F81652FDB9B22D<br>
0C4AB55FE9314FA5A9F81652FDB9B22D: 1 refs<br>
- ocfs2_hb_ctl -K -u
0C4AB55FE9314FA5A9F81652FDB9B22D<br>
ocfs2_hb_ctl: File not found by ocfs2_lookup
while stopping heartbeat<br>
- ls -Rl
/sys/kernel/config/cluster/CLUSTER/heartbeat/<br>
/sys/kernel/config/cluster/CLUSTER/heartbeat/:<br>
total 0<br>
drwxr-xr-x 2 root root 0 Oct 19 01:50
0C4AB55FE9314FA5A9F81652FDB9B22D<br>
-rw-r--r-- 1 root root 4096 Oct 19 01:40
dead_threshold<br>
<br>
/sys/kernel/config/cluster/CLUSTER/heartbeat/0C4AB55FE9314FA5A9F81652FDB9B22D:<br>
total 0<br>
-rw-r--r-- 1 root root 4096 Oct 19 01:50
block_bytes<br>
-rw-r--r-- 1 root root 4096 Oct 19 01:50 blocks<br>
-rw-r--r-- 1 root root 4096 Oct 19 01:50 dev<br>
-r--r--r-- 1 root root 4096 Oct 19 01:50 pid<br>
-rw-r--r-- 1 root root 4096 Oct 19 01:50
start_block<br>
<br>
- i cannot manually delete
/sys/kernel/config/cluster/CLUSTER/heartbeat/0C4AB55FE9314FA5A9F81652FDB9B22D/<br>
<br>
PS: i'm going to sleep now, i have to be up in a
few hours. We can continue tomorrow if it's ok
with you. <br>
Thank you for your help.<br>
<br>
Laurentiu.<br>
<br>
On 10/19/2011 01:33, Sunil Mushran wrote:
<blockquote
cite="mid:4E9DFEB0.9010206@oracle.com"
type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
One way this can happen is if one starts the
hb manually and then force<br>
formats on that volume. The format will
generate a new uuid. Once that<br>
happens, the hb tool cannot map the region to
the device and thus fail<br>
to stop it. Right now the easiest option on
this box is resetting it.<br>
<br>
On 10/18/2011 03:24 PM, Laurentiu Gosu wrote:
<blockquote
cite="mid:4E9DFC93.1050109@easic.ro"
type="cite">
<meta content="text/html;
charset=ISO-8859-1"
http-equiv="Content-Type">
Yes, i did reformat it(even more than once i
think, last week). This is a pre-production
system and i'm trying various options before
moving into real life.<br>
<br>
<br>
On 10/19/2011 01:19, Sunil Mushran wrote:
<blockquote
cite="mid:4E9DFB83.40603@oracle.com"
type="cite">
<meta content="text/html;
charset=ISO-8859-1"
http-equiv="Content-Type">
Did you reformat the volume recently? or,
when did you format last?<br>
<br>
On 10/18/2011 03:13 PM, Laurentiu Gosu
wrote:
<blockquote
cite="mid:4E9DFA03.8030405@easic.ro"
type="cite">
<meta content="text/html;
charset=ISO-8859-1"
http-equiv="Content-Type">
well..this is weird<br>
ls
/sys/kernel/config/cluster/CLUSTER/heartbeat/<br>
<b>918673F06F8F4ED188DDCE14F39945F6</b>
dead_threshold<br>
<br>
looks like we have different UUIDs.
Where is this coming from??<br>
<br>
ocfs2_hb_ctl -I -u
918673F06F8F4ED188DDCE14F39945F6<br>
918673F06F8F4ED188DDCE14F39945F6: 1 refs<br>
<br>
<br>
On 10/19/2011 01:04, Sunil Mushran
wrote:
<blockquote
cite="mid:4E9DF7D0.7090404@oracle.com"
type="cite">Let's do it by hand. <br>
rm -rf
/sys/kernel/config/cluster/.../heartbeat/<b>0C4AB55FE9314FA5A9F81652FDB9B22D
</b><br>
<br>
On 10/18/2011 02:52 PM, Laurentiu Gosu
wrote: <br>
<blockquote type="cite"> ocfs2_hb_ctl
-K -u
0C4AB55FE9314FA5A9F81652FDB9B22D <br>
ocfs2_hb_ctl: File not found by
ocfs2_lookup while stopping
heartbeat <br>
<br>
No improvment :( <br>
<br>
<br>
On 10/19/2011 00:50, Sunil Mushran
wrote: <br>
<blockquote type="cite">See if this
cleans it up. <br>
ocfs2_hb_ctl -K -u
0C4AB55FE9314FA5A9F81652FDB9B22D <br>
<br>
On 10/18/2011 02:44 PM, Laurentiu
Gosu wrote: <br>
<blockquote type="cite">ocfs2_hb_ctl
-I -u
0C4AB55FE9314FA5A9F81652FDB9B22D
<br>
0C4AB55FE9314FA5A9F81652FDB9B22D:
0 refs <br>
<br>
<br>
On 10/19/2011 00:43, Sunil
Mushran wrote: <br>
<blockquote type="cite">ocfs2_hb_ctl
-l -u
0C4AB55FE9314FA5A9F81652FDB9B22D
<br>
<br>
On 10/18/2011 02:40 PM,
Laurentiu Gosu wrote: <br>
<blockquote type="cite">mounted.ocfs2
-d <br>
Device FS
Stack
UUID
Label <br>
/dev/mapper/volgr1-lvol0
ocfs2 o2cb
0C4AB55FE9314FA5A9F81652FDB9B22D
ocfs2 <br>
<br>
mounted.ocfs2 -f <br>
Device FS
Nodes <br>
/dev/mapper/volgr1-lvol0
ocfs2 ro02xsrv001 <br>
<br>
ro02xsrv001 = the other node
in the cluster. <br>
<br>
By the way, there is no
/dev/md-2 <br>
ls /dev/dm-* <br>
/dev/dm-0 /dev/dm-1 <br>
<br>
<br>
On 10/19/2011 00:37, Sunil
Mushran wrote: <br>
<blockquote type="cite">So
it is not mounted. But we
still have a hb thread
because <br>
hb could not be stopped
during umount. The reason
for that <br>
could be the same that
causes ocfs2_hb_ctl to
fail. <br>
<br>
Do: <br>
mounted.ocfs2 -d <br>
<br>
On 10/18/2011 02:32 PM,
Laurentiu Gosu wrote: <br>
<blockquote type="cite">ls
-lR
/sys/kernel/debug/ocfs2
<br>
/sys/kernel/debug/ocfs2:
<br>
total 0 <br>
<br>
ls -lR
/sys/kernel/debug/o2dlm
<br>
/sys/kernel/debug/o2dlm:
<br>
total 0 <br>
<br>
ocfs2_hb_ctl -I -d
/dev/dm-2 <br>
ocfs2_hb_ctl: Device
name specified was not
found while reading uuid
<br>
<br>
There is no /dev/dm-2
mounted. <br>
<br>
<br>
On 10/19/2011 00:27,
Sunil Mushran wrote: <br>
<blockquote type="cite">mount
-t debugfs debugfs
/sys/kernel/debug <br>
<br>
Then list that dir. <br>
<br>
Also, do: <br>
ocfs2_hb_ctl -l -d
/dev/dm-2 <br>
<br>
Be careful before
killing. We want to be
sure that dev is not
mounted. <br>
<br>
On 10/18/2011 02:23
PM, Laurentiu Gosu
wrote: <br>
<blockquote
type="cite">Again
the outputs: <br>
cat
/sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev<br>
dm-2 <br>
--->here should
be volgr1-lvol0 i
guess? <br>
<br>
ls -lR
/sys/kernel/debug/ocfs2
<br>
ls:
/sys/kernel/debug/ocfs2:
No such file or
directory <br>
<br>
ls -lR
/sys/kernel/debug/o2dlm
<br>
ls:
/sys/kernel/debug/o2dlm:
No such file or
directory <br>
<br>
I think i have to
enable debug first
somehow..? <br>
<br>
Laurentiu. <br>
<br>
On 10/19/2011 00:17,
Sunil Mushran wrote:
<br>
<blockquote
type="cite">What
does this return?
<br>
cat
/sys/kernel/config/cluster/CLUSTER/heartbeat/918673F06F8F4ED188DDCE14F39945F6/dev<br>
<br>
Also, do: <br>
ls -lR
/sys/kernel/debug/ocfs2
<br>
ls -lR
/sys/kernel/debug/o2dlm
<br>
<br>
On 10/18/2011
02:14 PM,
Laurentiu Gosu
wrote: <br>
<blockquote
type="cite">Here
is the output: <br>
<br>
ls -lR
/sys/kernel/config/cluster
<br>
/sys/kernel/config/cluster:
<br>
total 0 <br>
drwxr-xr-x 4
root root 0 Oct
19 00:12 CLUSTER
<br>
<br>
/sys/kernel/config/cluster/CLUSTER:
<br>
total 0 <br>
-rw-r--r-- 1
root root 4096
Oct 19 00:12
fence_method <br>
drwxr-xr-x 3
root root 0
Oct 19 00:12
heartbeat <br>
-rw-r--r-- 1
root root 4096
Oct 19 00:12
idle_timeout_ms
<br>
-rw-r--r-- 1
root root 4096
Oct 19 00:12
keepalive_delay_ms
<br>
drwxr-xr-x 4
root root 0
Oct 11 20:23
node <br>
-rw-r--r-- 1
root root 4096
Oct 19 00:12
reconnect_delay_ms
<br>
<br>
/sys/kernel/config/cluster/CLUSTER/heartbeat:
<br>
total 0 <br>
drwxr-xr-x 2
root root 0
Oct 19 00:12
918673F06F8F4ED188DDCE14F39945F6
<br>
-rw-r--r-- 1
root root 4096
Oct 19 00:12
dead_threshold <br>
<br>
/sys/kernel/config/cluster/CLUSTER/heartbeat/<b>918673F06F8F4ED188DDCE14F39945F6</b>:
<br>
total 0 <br>
-rw-r--r-- 1
root root 4096
Oct 19 00:12
block_bytes <br>
-rw-r--r-- 1
root root 4096
Oct 19 00:12
blocks <br>
-rw-r--r-- 1
root root 4096
Oct 19 00:12 dev
<br>
-r--r--r-- 1
root root 4096
Oct 19 00:12 pid
<br>
-rw-r--r-- 1
root root 4096
Oct 19 00:12
start_block <br>
<br>
/sys/kernel/config/cluster/CLUSTER/node:
<br>
total 0 <br>
drwxr-xr-x 2
root root 0 Oct
19 00:12
ro02xsrv001 <br>
drwxr-xr-x 2
root root 0 Oct
19 00:12
ro02xsrv002 <br>
<br>
/sys/kernel/config/cluster/CLUSTER/node/ro02xsrv001:
<br>
total 0 <br>
-rw-r--r-- 1
root root 4096
Oct 19 00:12
ipv4_address <br>
-rw-r--r-- 1
root root 4096
Oct 19 00:12
ipv4_port <br>
-rw-r--r-- 1
root root 4096
Oct 19 00:12
local <br>
-rw-r--r-- 1
root root 4096
Oct 19 00:12 num
<br>
<br>
/sys/kernel/config/cluster/CLUSTER/node/ro02xsrv002:
<br>
total 0 <br>
-rw-r--r-- 1
root root 4096
Oct 19 00:12
ipv4_address <br>
-rw-r--r-- 1
root root 4096
Oct 19 00:12
ipv4_port <br>
-rw-r--r-- 1
root root 4096
Oct 19 00:12
local <br>
-rw-r--r-- 1
root root 4096
Oct 19 00:12 num
<br>
<br>
<br>
<br>
<br>
On 10/19/2011
00:12, Sunil
Mushran wrote: <br>
<blockquote
type="cite">ls
-lR
/sys/kernel/config/cluster
<br>
<br>
What does this
return? <br>
<br>
On 10/18/2011
02:05 PM,
Laurentiu Gosu
wrote: <br>
<blockquote
type="cite">Hi,
<br>
I have a 2
nodes ocfs2
cluster
running UEK
2.6.32-100.0.19.el5,
<br>
ocfs2console-1.6.3-2.el5,
ocfs2-tools-1.6.3-2.el5.
<br>
My problem is
that all the
time when i
try to run
/etc/init.d/o2cb
stop <br>
it fails with
this error: <br>
Stopping
O2CB cluster
CLUSTER:
Failed <br>
Unable
to stop
cluster as
heartbeat
region still
active <br>
There is no
active mount
point. I tried
to manually
stop the
heartdbeat <br>
with
"ocfs2_hb_ctl
-K -d
/dev/mapper/volgr1-lvol0
ocfs2" (after
finding <br>
the refs
number with
"ocfs2_hb_ctl
-I -d
/dev/mapper/volgr1-lvol0
"). <br>
But even if
refs number is
set to zero
the "heartbeat
region still <br>
active"
occurs. <br>
How can i fix
this? <br>
<br>
Thank you in
advance. <br>
Laurentiu. <br>
<br>
<br>
_______________________________________________
<br>
Ocfs2-users
mailing list <br>
<a
moz-do-not-send="true"
class="moz-txt-link-abbreviated"
href="mailto:Ocfs2-users@oss.oracle.com">Ocfs2-users@oss.oracle.com</a>
<br>
<a
moz-do-not-send="true"
class="moz-txt-link-freetext"
href="http://oss.oracle.com/mailman/listinfo/ocfs2-users">http://oss.oracle.com/mailman/listinfo/ocfs2-users</a>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
</blockquote>
<br>
<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Ocfs2-users mailing list
<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:Ocfs2-users@oss.oracle.com">Ocfs2-users@oss.oracle.com</a>
<a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://oss.oracle.com/mailman/listinfo/ocfs2-users">http://oss.oracle.com/mailman/listinfo/ocfs2-users</a></pre>
</blockquote>
<br>
</blockquote>
<br>
</body>
</html>