[Ocfs2-tools-devel] ocfs2-tools-o2cb-1.8.2 - critical issue o2cb

Eugene Istomin E.Istomin at edss.ee
Fri Mar 15 10:26:39 PDT 2013


Ok, thanks anyway.


A couple of hours i cloned git:master and build rpms with as little as needed 
patches.

Here is the project: 
https://build.opensuse.org/package/show?package=ocfs2-
tools&project=home%3Aedssvirt%3Abranches%3Anetwork%3Aha-clustering

Here is buid log: 
https://build.opensuse.org/package/live_build_log?arch=x86_64&package=ocfs2-
tools&project=home%3Aedssvirt%3Abranches%3Anetwork%3Aha-
clustering&repository=openSUSE_12.2

Here are rpms: 
http://download.opensuse.org/repositories/home:/edssvirt:/branches:/network:/ha-
clustering/openSUSE_12.2/x86_64/




Results stays the same:

# cat /etc/ocfs2/cluster.conf
cluster:
        heartbeat_mode = global
        node_count = 3
        name = storage2

node:
        number = 0
        cluster = storage2
        ip_port = 7777
        ip_address = 10.251.2.11
        name = tsc-hv01

node:
        number = 1
        cluster = storage2
        ip_port = 7777
        ip_address = 10.251.2.12
        name = tsc-hv02
node:
        number = 2
        cluster = storage2
        ip_port = 7777
        ip_address = 10.251.2.13
        name = tsc-hv03

heartbeat:
        cluster = storage2
        region = 68677F18B1654877BB92D78D400E7E51




# o2cb -vvv add-node storage2 tsc-hv04 --ip 10.251.2.14
Using config file '/etc/ocfs2/cluster.conf'
Add node 'tsc-hv04' in cluster 'storage2' having ip '10.251.2.14', port '-1' 
and number '-1'
Validated IP address '10.251.2.14'
Validated node number '1'  <--- so strange
Added node 'tsc-hv04' in cluster 'storage2' having ip '10.251.2.14', port 
'7777' and number '1'

# o2cb -vvv add-node storage2 tsc-hv05 --ip 10.251.2.15 --number 6
Using config file '/etc/ocfs2/cluster.conf'
Add node 'tsc-hv05' in cluster 'storage2' having ip '10.251.2.15', port '-1' 
and number '6'
Validated IP address '10.251.2.15'
Validated node number '6' <-- ok here
Added node 'tsc-hv05' in cluster 'storage2' having ip '10.251.2.15', port 
'7777' and number '6'



cat /etc/ocfs2/cluster.conf
cluster:
        heartbeat_mode = global
        node_count = 5
        name = storage2

node:
        number = 0
        cluster = storage2
        ip_port = 7777
        ip_address = 10.251.2.11
        name = tsc-hv01

heartbeat:
        cluster = storage2
        region = 68677F18B1654877BB92D78D400E7E51




Can anyone help us?

-- 
Best regards,
Eugene Istomin
Senior System Administrator
EDS Systems
E.Istomin at edss.ee
Work: +372-640-96-01


On Friday 15 March 2013 08:30:55 Sunil Mushran wrote:

This is a tool issue. Not kernel.

Did you try building 1.8.2 from oss.oracle.com/git? It was working fine when I 
last worked on it.
Maybe someone else on this list can assist you further. Specifically someone 
needs to put a
breakpoint in o2cb_config_store() as that is where we issue the write. Could be 
it is not being
called. The flow is simple enough.... and looks correct on the git tree.




On Fri, Mar 15, 2013 at 1:24 AM, Eugene Istomin <E.Istomin at edss.ee> wrote:

Sunil,
 
I use packages from 
https://build.opensuse.org/package/show?project=network%3Aha-
clustering&package=ocfs2-tools 
 
Here is the compile log: 
https://build.opensuse.org/package/rawlog?arch=x86_64&package=ocfs2-
tools&project=network%3Aha-clustering&repository=openSUSE_12.2 
 
I spent 4 hours yesterday to try different compilcation variants to double 
check of linux kernels & package versions problems - all results are the same.
-- 
Best regards,
Eugene Istomin
Senior System Administrator
EDS Systems
E.Istomin at edss.ee 
Work: +372-640-96-01 
 

On Friday 15 March 2013 10:14:23 Eugene Istomin wrote:

Hello Sunil,
 
here is step-by-step to reproduce this issue:
 
1) Delete current conf
# rm /etc/ocfs2/cluster.conf
 
2) Create cluster & autocreate conf
# /tmp/o2cb-1.8.2 -vvv add-cluster storage
 
3) # cat /etc/ocfs2/cluster.conf
cluster:
heartbeat_mode = local
node_count = 0
name = storage
 
 
 
4) Adding 2 nodes using o2cb 1.8.2
 
# /tmp/o2cb-1.8.2 -vvv add-node storage tsc-hv01 --ip 10.251.2.11
Using config file '/etc/ocfs2/cluster.conf'
Add node 'tsc-hv01' in cluster 'storage' having ip '10.251.2.11', port '-1' 
and number '-1' 
Validated IP address '10.251.2.11'
Validated node number '0'
Added node 'tsc-hv01' in cluster 'storage' having ip '10.251.2.11', port 
'7777' and number '0' 
 
# /tmp/o2cb-1.8.2 -vvv add-node storage tsc-hv02 --ip 10.251.2.12
Using config file '/etc/ocfs2/cluster.conf'
Add node 'tsc-hv02' in cluster 'storage' having ip '10.251.2.12', port '-1' 
and number '-1' 
Validated IP address '10.251.2.12'
Validated node number '1'
Added node 'tsc-hv02' in cluster 'storage' having ip '10.251.2.12', port 
'7777' and number '1' 
 
5) Checking conf for nodes 
# cat /etc/ocfs2/cluster.conf
cluster:
heartbeat_mode = local
node_count = 2
name = storage
 
node:
number = 0
cluster = storage
ip_port = 7777
ip_address = 10.251.2.11
name = tsc-hv01
 
6) Lets try 1.8.0
# /tmp/o2cb-1.8.0 -vvv add-node storage tsc-hv03 --ip 10.251.2.13
Using config file '/etc/ocfs2/cluster.conf'
Add node 'tsc-hv03' in cluster 'storage' having ip '10.251.2.13', port '-1' 
and number '-1' 
Validated IP address '10.251.2.13'
Validated node number '1'
Added node 'tsc-hv03' in cluster 'storage' having ip '10.251.2.13', port 
'7777' and number '1' 
 
# /tmp/o2cb-1.8.0 -vvv add-node storage tsc-hv04 --ip 10.251.2.14
Using config file '/etc/ocfs2/cluster.conf'
Add node 'tsc-hv04' in cluster 'storage' having ip '10.251.2.14', port '-1' 
and number '-1' 
Validated IP address '10.251.2.14'
Validated node number '2'
Added node 'tsc-hv04' in cluster 'storage' having ip '10.251.2.14', port 
'7777' and number '2' 
 
7) Checking conf for nodes 
# cat /etc/ocfs2/cluster.conf
cluster:
heartbeat_mode = local
node_count = 4
name = storage
 
node:
number = 0
cluster = storage
ip_port = 7777
ip_address = 10.251.2.11
name = tsc-hv01
 
node:
number = 1
cluster = storage
ip_port = 7777
ip_address = 10.251.2.13
name = tsc-hv03
 
node:
number = 2
cluster = storage
ip_port = 7777
ip_address = 10.251.2.14
name = tsc-hv04
 
 
-- 
Best regards,
Eugene Istomin
Senior System Administrator
EDS Systems
E.Istomin at edss.ee 
Work: +372-640-96-01 
 

On Thursday 14 March 2013 20:22:42 Sunil Mushran wrote:

So you are saying 1.8.2 is broken. Enable verbose tracing. That may tell us 
more.
Do "o2cb -vvv add-nodes ..." to enable verbose tracing.



On Thu, Mar 14, 2013 at 2:49 PM, Eugene Istomin <E.Istomin at edss.ee> wrote:

Ok, i explain the problem
 
 
1) # cat /etc/ocfs2/cluster.conf
cluster:
heartbeat_mode = global
node_count = 1
name = storage
 
node:
number = 1
cluster = storage
ip_port = 7777
ip_address = 10.251.2.11
name = tsc-hv01
 
 
2)o2cb - 1.8.0
 
#./o2cb-1.8.0 -V
o2cb-1.8.0 1.8.0
 
# ./o2cb-1.8.0 add-node storage tsc-hv02 --ip 10.251.2.12
# ./o2cb-1.8.0 add-node storage tsc-hv03 --ip 10.251.2.13
# cat /etc/ocfs2/cluster.conf
cluster:
heartbeat_mode = global
node_count = 3
name = storage
 
node:
number = 1
cluster = storage
ip_port = 7777
ip_address = 10.251.2.11
name = tsc-hv01
 
node:
number = 0
cluster = storage
ip_port = 7777
ip_address = 10.251.2.12
name = tsc-hv02
 
node:
number = 2
cluster = storage
ip_port = 7777
ip_address = 10.251.2.13
name = tsc-hv03
 
 
Seems ok
 
 
 
3) o2cb 1.8.2
# ./o2cb-1.8.2 add-node storage tsc-hv04 --ip 10.251.2.14
# ./o2cb-1.8.2 add-node storage tsc-hv05 --ip 10.251.2.15
# cat /etc/ocfs2/cluster.conf
cluster:
heartbeat_mode = global
node_count = 5
name = storage
 
node:
number = 1
cluster = storage
ip_port = 7777
ip_address = 10.251.2.11
name = tsc-hv01
 
 
 
All of other node records are disappeared, strace gets that valid config is 
opened but new conf is consist of only first node in list (but node_count is 
incremented). 
 
I try lastest git, results are the same.
 
-- 
Best regards,
Eugene Istomin
Senior System Administrator
EDS Systems
E.Istomin at edss.ee 
Work: +372-640-96-01 
 

On Thursday 14 March 2013 14:41:18 Sunil Mushran wrote:

add-node only adds to the local config file. You have to do add-node on all 
nodes... followed by register-cluster on all nodes.

Until that is done, the cluster will refuse to mount new volumes on any node.




On Thu, Mar 14, 2013 at 2:37 PM, Eugene Istomin <E.Istomin at edss.ee> wrote:

Additional log:
 
tsc-hv01:/tmp # o2cb add-node storage tsc-hv03 --ip 10.251.2.13
tsc-hv01:/tmp # /tmp/o2cb list-nodes --oneline storage
node: 1 tsc-hv01 10.251.2.11:7777 storage 
node: 2 tsc-hv02 10.251.2.12:7777 storage 
node: 0 tsc-hv03 10.251.2.13:7777 storage 
 
tsc-hv01:/tmp # o2cb add-node storage tsc-hv04 --ip 10.251.2.14
tsc-hv01:/tmp # /tmp/o2cb list-nodes --oneline storage
node: 1 tsc-hv01 10.251.2.11:7777 storage 
node: 2 tsc-hv02 10.251.2.12:7777 storage 
node: 0 tsc-hv03 10.251.2.13:7777 storage 
node: 3 tsc-hv04 10.251.2.14:7777 storage 
 
 
# cat /etc/ocfs2/cluster.conf
cluster:
heartbeat_mode = global
node_count = 4
name = storage
 
node:
number = 1
cluster = storage
ip_port = 7777
ip_address = 10.251.2.11
name = tsc-hv01
 
node:
number = 2
cluster = storage
ip_port = 7777
ip_address = 10.251.2.12
name = tsc-hv02
 
node:
number = 0
cluster = storage
ip_port = 7777
ip_address = 10.251.2.13
name = tsc-hv03
 
node:
number = 3
cluster = storage
ip_port = 7777
ip_address = 10.251.2.14
name = tsc-hv04
 
 
but
 
 
tsc-hv01:/tmp # /sbin/o2cb add-node storage tsc-hv05 --ip 10.251.2.15
tsc-hv01:/tmp # cat /etc/ocfs2/cluster.conf
cluster:
heartbeat_mode = global
node_count = 5
name = storage
 
node:
number = 1
cluster = storage
ip_port = 7777
ip_address = 10.251.2.11
name = tsc-hv01
 
 
 
-- 
Best regards,
Eugene Istomin
Senior System Administrator
EDS Systems
E.Istomin at edss.ee 
Work: +372-640-96-01 
 
 

On Thursday 14 March 2013 23:33:58 Eugene Istomin wrote:

Thanks for the answer,
 
 
# /sbin/o2cb -V
o2cb.old 1.8.2
 
# /sbin/o2cb list-nodes --oneline storage
node: 1 tsc-hv01 10.251.2.11:7777 storage 
 
 
but
 
 
#/tmp/o2cb -V
o2cb 1.8.0
 
# /tmp/o2cb list-nodes --oneline storage
node: 1 tsc-hv01 10.251.2.11:7777 storage 
node: 2 tsc-hv02 10.251.2.12:7777 storage 
 
 
-- 
Best regards,
Eugene Istomin
Senior System Administrator
EDS Systems
E.Istomin at edss.ee 
Work: +372-640-96-01 
 
 

On Thursday 14 March 2013 14:20:53 Sunil Mushran wrote:

strace is hard to read.


list-nodes --online prints the nodes that have been registered. If a node 
shows fewer than in the config file, then the cluster needs to be (re)registered 
on that node.




On Thu, Mar 14, 2013 at 12:21 PM, Eugene Istomin <E.Istomin at edss.ee> wrote:

Hello Sunil,
 
we have critical issue in o2cb part of ocfs2 1.8.2 - getting list of node or 
adding node does not affect to ocfs2.conf. 
 
We have this issue on 3 different linuxes (kernel 3.2 - 3.8) so i thik this 
might be a sort of general o2cb problems.
 
 
#####
 
Here is some debug info
 
# cat /etc/ocfs2/cluster.conf
cluster:
heartbeat_mode = global
node_count = 2
name = storage
 
node:
number = 1
cluster = storage
ip_port = 7777
ip_address = 10.251.2.11
name = tsc-hv01
 
node:
number = 2
cluster = storage
ip_port = 7777
ip_address = 10.251.2.12
name = tsc-hv02
 
 
 
In ocfs2 1.8.0 (return 2 nodes):
# strace -s 2048 ./o2cb list-nodes --oneline storage
 
 
stat("/etc/ocfs2/cluster.conf", {st_mode=S_IFREG|0644, st_size=261, ...}) = 0
open("/etc/ocfs2/cluster.conf", O_RDONLY) = 3
read(3, "cluster:\n\theartbeat_mode = global\n\tnode_count = 2\n\tname = 
storage\n\nnode:\n\tnumber = 1\n\tcluster = storage\n\tip_port = 
7777\n\tip_address = 10.251.2.11\n\tname = tsc-hv01\n\nnode:\n\tnumber = 
2\n\tcluster = storage\n\tip_port = 7777\n\tip_address = 10.251.2.12\n\tname = 
tsc-hv02\n\n", 4000) = 261 
read(3, "", 4000) = 0
close(3) = 0
write(1, "node: 1 tsc-hv01 10.251.2.11:7777 storage\n", 42node: 1 tsc-hv01 
10.251.2.11:7777 storage 
) = 42
write(1, "node: 2 tsc-hv02 10.251.2.12:7777 storage\n", 42node: 2 tsc-hv02 
10.251.2.12:7777 storage 
) = 42
exit_group(0) = ?
 
 
 
In ocfs2 1.8.2 (return 1 node but config have 2 nodes ): 
#strace -s 2048 /sbin/o2cb list-nodes --oneline storage
 
stat("/etc/ocfs2/cluster.conf", {st_mode=S_IFREG|0644, st_size=261, ...}) = 0
open("/etc/ocfs2/cluster.conf", O_RDONLY) = 3
read(3, "cluster:\n\theartbeat_mode = global\n\tnode_count = 2\n\tname = 
storage\n\nnode:\n\tnumber = 1\n\tcluster = storage\n\tip_port = 
7777\n\tip_address = 10.251.2.11\n\tname = tsc-hv01\n\nnode:\n\tnumber = 
2\n\tcluster = storage\n\tip_port = 7777\n\tip_address = 10.251.2.12\n\tname = 
tsc-hv02\n\n", 4000) = 261 
read(3, "", 4000) = 0
close(3) = 0
write(1, "node: 1 tsc-hv01 10.251.2.11:7777 storage\n", 42node: 1 tsc-hv01 
10.251.2.11:7777 storage 
) = 42
exit_group(0) = ?
 
 
 
 
I can mail you any info you need, please help to resolve this issue.
-- 
Best regards,
Eugene Istomin
Senior System Administrator
EDS Systems
E.Istomin at edss.ee 
Work: +372-640-96-01 

_______________________________________________
Ocfs2-tools-devel mailing list
Ocfs2-tools-devel at oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-tools-devel










_______________________________________________
Ocfs2-tools-devel mailing list
Ocfs2-tools-devel at oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-tools-devel















_______________________________________________
Ocfs2-tools-devel mailing list
Ocfs2-tools-devel at oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-tools-devel





-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-tools-devel/attachments/20130315/7cea3acf/attachment-0001.html 


More information about the Ocfs2-tools-devel mailing list