[Ocfs2-tools-devel] ocfs2-tools-o2cb-1.8.2 - critical issue o2cb

Eugene Istomin E.Istomin at edss.ee
Fri Mar 15 01:14:23 PDT 2013


Hello Sunil,

here is step-by-step to reproduce this issue:

1) Delete current conf
# rm /etc/ocfs2/cluster.conf

2) Create cluster & autocreate conf
 # /tmp/o2cb-1.8.2 -vvv add-cluster storage

3) # cat /etc/ocfs2/cluster.conf
cluster:
        heartbeat_mode = local
        node_count = 0
        name = storage



4) Adding 2 nodes using o2cb 1.8.2

# /tmp/o2cb-1.8.2 -vvv add-node storage tsc-hv01 --ip 10.251.2.11
Using config file '/etc/ocfs2/cluster.conf'
Add node 'tsc-hv01' in cluster 'storage' having ip '10.251.2.11', port '-1' 
and number '-1'
Validated IP address '10.251.2.11'
Validated node number '0'
Added node 'tsc-hv01' in cluster 'storage' having ip '10.251.2.11', port 
'7777' and number '0'

 # /tmp/o2cb-1.8.2 -vvv add-node storage tsc-hv02 --ip 10.251.2.12
Using config file '/etc/ocfs2/cluster.conf'
Add node 'tsc-hv02' in cluster 'storage' having ip '10.251.2.12', port '-1' 
and number '-1'
Validated IP address '10.251.2.12'
Validated node number '1'
Added node 'tsc-hv02' in cluster 'storage' having ip '10.251.2.12', port 
'7777' and number '1'

5) Checking conf for nodes 
# cat /etc/ocfs2/cluster.conf
cluster:
        heartbeat_mode = local
        node_count = 2
        name = storage

node:
        number = 0
        cluster = storage
        ip_port = 7777
        ip_address = 10.251.2.11
        name = tsc-hv01

6) Lets try 1.8.0
# /tmp/o2cb-1.8.0 -vvv add-node storage tsc-hv03 --ip 10.251.2.13
Using config file '/etc/ocfs2/cluster.conf'
Add node 'tsc-hv03' in cluster 'storage' having ip '10.251.2.13', port '-1' 
and number '-1'
Validated IP address '10.251.2.13'
Validated node number '1'
Added node 'tsc-hv03' in cluster 'storage' having ip '10.251.2.13', port 
'7777' and number '1'

 # /tmp/o2cb-1.8.0 -vvv add-node storage tsc-hv04 --ip 10.251.2.14
Using config file '/etc/ocfs2/cluster.conf'
Add node 'tsc-hv04' in cluster 'storage' having ip '10.251.2.14', port '-1' 
and number '-1'
Validated IP address '10.251.2.14'
Validated node number '2'
Added node 'tsc-hv04' in cluster 'storage' having ip '10.251.2.14', port 
'7777' and number '2'

7) Checking conf for nodes 
 # cat /etc/ocfs2/cluster.conf
cluster:
        heartbeat_mode = local
        node_count = 4
        name = storage

node:
        number = 0
        cluster = storage
        ip_port = 7777
        ip_address = 10.251.2.11
        name = tsc-hv01

node:
        number = 1
        cluster = storage
        ip_port = 7777
        ip_address = 10.251.2.13
        name = tsc-hv03

node:
        number = 2
        cluster = storage
        ip_port = 7777
        ip_address = 10.251.2.14
        name = tsc-hv04


-- 
Best regards,
Eugene Istomin
Senior System Administrator
EDS Systems
E.Istomin at edss.ee
Work: +372-640-96-01


On Thursday 14 March 2013 20:22:42 Sunil Mushran wrote:

So you are saying 1.8.2 is broken. Enable verbose tracing. That may tell us 
more.
Do "o2cb -vvv add-nodes ..." to enable verbose tracing.



On Thu, Mar 14, 2013 at 2:49 PM, Eugene Istomin <E.Istomin at edss.ee> wrote:

Ok, i explain the problem
 
 
1) # cat /etc/ocfs2/cluster.conf
cluster:
heartbeat_mode = global
node_count = 1
name = storage
 
node:
number = 1
cluster = storage
ip_port = 7777
ip_address = 10.251.2.11
name = tsc-hv01
 
 
2)o2cb - 1.8.0
 
#./o2cb-1.8.0 -V
o2cb-1.8.0 1.8.0
 
# ./o2cb-1.8.0 add-node storage tsc-hv02 --ip 10.251.2.12
# ./o2cb-1.8.0 add-node storage tsc-hv03 --ip 10.251.2.13
# cat /etc/ocfs2/cluster.conf
cluster:
heartbeat_mode = global
node_count = 3
name = storage
 
node:
number = 1
cluster = storage
ip_port = 7777
ip_address = 10.251.2.11
name = tsc-hv01
 
node:
number = 0
cluster = storage
ip_port = 7777
ip_address = 10.251.2.12
name = tsc-hv02
 
node:
number = 2
cluster = storage
ip_port = 7777
ip_address = 10.251.2.13
name = tsc-hv03
 
 
Seems ok
 
 
 
3) o2cb 1.8.2
# ./o2cb-1.8.2 add-node storage tsc-hv04 --ip 10.251.2.14
# ./o2cb-1.8.2 add-node storage tsc-hv05 --ip 10.251.2.15
# cat /etc/ocfs2/cluster.conf
cluster:
heartbeat_mode = global
node_count = 5
name = storage
 
node:
number = 1
cluster = storage
ip_port = 7777
ip_address = 10.251.2.11
name = tsc-hv01
 
 
 
All of other node records are disappeared, strace gets that valid config is 
opened but new conf is consist of only first node in list (but node_count is 
incremented). 
 
I try lastest git, results are the same.
 
-- 
Best regards,
Eugene Istomin
Senior System Administrator
EDS Systems
E.Istomin at edss.ee 
Work: +372-640-96-01 
 

On Thursday 14 March 2013 14:41:18 Sunil Mushran wrote:

add-node only adds to the local config file. You have to do add-node on all 
nodes... followed by register-cluster on all nodes.

Until that is done, the cluster will refuse to mount new volumes on any node.




On Thu, Mar 14, 2013 at 2:37 PM, Eugene Istomin <E.Istomin at edss.ee> wrote:

Additional log:
 
tsc-hv01:/tmp # o2cb add-node storage tsc-hv03 --ip 10.251.2.13
tsc-hv01:/tmp # /tmp/o2cb list-nodes --oneline storage
node: 1 tsc-hv01 10.251.2.11:7777 storage 
node: 2 tsc-hv02 10.251.2.12:7777 storage 
node: 0 tsc-hv03 10.251.2.13:7777 storage 
 
tsc-hv01:/tmp # o2cb add-node storage tsc-hv04 --ip 10.251.2.14
tsc-hv01:/tmp # /tmp/o2cb list-nodes --oneline storage
node: 1 tsc-hv01 10.251.2.11:7777 storage 
node: 2 tsc-hv02 10.251.2.12:7777 storage 
node: 0 tsc-hv03 10.251.2.13:7777 storage 
node: 3 tsc-hv04 10.251.2.14:7777 storage 
 
 
# cat /etc/ocfs2/cluster.conf
cluster:
heartbeat_mode = global
node_count = 4
name = storage
 
node:
number = 1
cluster = storage
ip_port = 7777
ip_address = 10.251.2.11
name = tsc-hv01
 
node:
number = 2
cluster = storage
ip_port = 7777
ip_address = 10.251.2.12
name = tsc-hv02
 
node:
number = 0
cluster = storage
ip_port = 7777
ip_address = 10.251.2.13
name = tsc-hv03
 
node:
number = 3
cluster = storage
ip_port = 7777
ip_address = 10.251.2.14
name = tsc-hv04
 
 
but
 
 
tsc-hv01:/tmp # /sbin/o2cb add-node storage tsc-hv05 --ip 10.251.2.15
tsc-hv01:/tmp # cat /etc/ocfs2/cluster.conf
cluster:
heartbeat_mode = global
node_count = 5
name = storage
 
node:
number = 1
cluster = storage
ip_port = 7777
ip_address = 10.251.2.11
name = tsc-hv01
 
 
 
-- 
Best regards,
Eugene Istomin
Senior System Administrator
EDS Systems
E.Istomin at edss.ee 
Work: +372-640-96-01 
 
 

On Thursday 14 March 2013 23:33:58 Eugene Istomin wrote:

Thanks for the answer,
 
 
# /sbin/o2cb -V
o2cb.old 1.8.2
 
# /sbin/o2cb list-nodes --oneline storage
node: 1 tsc-hv01 10.251.2.11:7777 storage 
 
 
but
 
 
#/tmp/o2cb -V
o2cb 1.8.0
 
# /tmp/o2cb list-nodes --oneline storage
node: 1 tsc-hv01 10.251.2.11:7777 storage 
node: 2 tsc-hv02 10.251.2.12:7777 storage 
 
 
-- 
Best regards,
Eugene Istomin
Senior System Administrator
EDS Systems
E.Istomin at edss.ee 
Work: +372-640-96-01 
 
 

On Thursday 14 March 2013 14:20:53 Sunil Mushran wrote:

strace is hard to read.


list-nodes --online prints the nodes that have been registered. If a node 
shows fewer than in the config file, then the cluster needs to be (re)registered 
on that node.




On Thu, Mar 14, 2013 at 12:21 PM, Eugene Istomin <E.Istomin at edss.ee> wrote:

Hello Sunil,
 
we have critical issue in o2cb part of ocfs2 1.8.2 - getting list of node or 
adding node does not affect to ocfs2.conf. 
 
We have this issue on 3 different linuxes (kernel 3.2 - 3.8) so i thik this 
might be a sort of general o2cb problems.
 
 
#####
 
Here is some debug info
 
# cat /etc/ocfs2/cluster.conf
cluster:
heartbeat_mode = global
node_count = 2
name = storage
 
node:
number = 1
cluster = storage
ip_port = 7777
ip_address = 10.251.2.11
name = tsc-hv01
 
node:
number = 2
cluster = storage
ip_port = 7777
ip_address = 10.251.2.12
name = tsc-hv02
 
 
 
In ocfs2 1.8.0 (return 2 nodes):
# strace -s 2048 ./o2cb list-nodes --oneline storage
 
 
stat("/etc/ocfs2/cluster.conf", {st_mode=S_IFREG|0644, st_size=261, ...}) = 0
open("/etc/ocfs2/cluster.conf", O_RDONLY) = 3
read(3, "cluster:\n\theartbeat_mode = global\n\tnode_count = 2\n\tname = 
storage\n\nnode:\n\tnumber = 1\n\tcluster = storage\n\tip_port = 
7777\n\tip_address = 10.251.2.11\n\tname = tsc-hv01\n\nnode:\n\tnumber = 
2\n\tcluster = storage\n\tip_port = 7777\n\tip_address = 10.251.2.12\n\tname = 
tsc-hv02\n\n", 4000) = 261 
read(3, "", 4000) = 0
close(3) = 0
write(1, "node: 1 tsc-hv01 10.251.2.11:7777 storage\n", 42node: 1 tsc-hv01 
10.251.2.11:7777 storage 
) = 42
write(1, "node: 2 tsc-hv02 10.251.2.12:7777 storage\n", 42node: 2 tsc-hv02 
10.251.2.12:7777 storage 
) = 42
exit_group(0) = ?
 
 
 
In ocfs2 1.8.2 (return 1 node but config have 2 nodes ): 
#strace -s 2048 /sbin/o2cb list-nodes --oneline storage
 
stat("/etc/ocfs2/cluster.conf", {st_mode=S_IFREG|0644, st_size=261, ...}) = 0
open("/etc/ocfs2/cluster.conf", O_RDONLY) = 3
read(3, "cluster:\n\theartbeat_mode = global\n\tnode_count = 2\n\tname = 
storage\n\nnode:\n\tnumber = 1\n\tcluster = storage\n\tip_port = 
7777\n\tip_address = 10.251.2.11\n\tname = tsc-hv01\n\nnode:\n\tnumber = 
2\n\tcluster = storage\n\tip_port = 7777\n\tip_address = 10.251.2.12\n\tname = 
tsc-hv02\n\n", 4000) = 261 
read(3, "", 4000) = 0
close(3) = 0
write(1, "node: 1 tsc-hv01 10.251.2.11:7777 storage\n", 42node: 1 tsc-hv01 
10.251.2.11:7777 storage 
) = 42
exit_group(0) = ?
 
 
 
 
I can mail you any info you need, please help to resolve this issue.
-- 
Best regards,
Eugene Istomin
Senior System Administrator
EDS Systems
E.Istomin at edss.ee 
Work: +372-640-96-01 

_______________________________________________
Ocfs2-tools-devel mailing list
Ocfs2-tools-devel at oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-tools-devel










_______________________________________________
Ocfs2-tools-devel mailing list
Ocfs2-tools-devel at oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-tools-devel










-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-tools-devel/attachments/20130315/0709bde7/attachment-0001.html 


More information about the Ocfs2-tools-devel mailing list