[Ocfs2-users] ocfs2/o2cb problem with openais/pacemaker

Jürgen Herrmann Juergen.Herrmann at XLhost.de
Mon Apr 12 05:53:37 PDT 2010


hi!

i'm on debian lenny and trying to run ocfs2 on a dual primary
drbd device. the drbd device is already set up as msDRBD0.

to get dlm_controld.pcmk i installed it from source (from
cluster-suite-3.0.10)
now i configured a resource "resDLM" with 2 clones:
  primitive resDLM ocf:pacemaker:controld op monitor interval="120s"
  clone cloneDLM resDLM meta globally-unique="false" interleave="true"
  colocation colDLM_DRBD0 inf: cloneDLM msDRBD0:Master
  order ordDRBD0_DLM inf: msDRBD0:promote cloneDLM:start
-> seems to work.


to get ocfs2_controld.pcmk i installed ocfs2-tools-1.4.3 from source.
after adding the resource:
  primitive resO2CB ocf:pacemaker:o2cb op monitor interval="120s"
  clone cloneO2CB resO2CB meta globally-unique="false" interleave="true"
  colocation colO2CB_DLM inf: cloneO2CB cloneDLM
  order ordDLM_O2CB inf: cloneDLM cloneO2CB

i get the following errors in crm_mon:
======================================
Failed actions:
    resO2CB:0_start_0 (node=app1b.xlhost.de, call=28, rc=1,
status=complete): unknown error
    resO2CB:0_start_0 (node=app1a.xlhost.de, call=38, rc=1,
status=complete): unknown error


the relevant syslog entries:
============================
Apr 12 13:15:18 app1a corosync[4638]:   [pcmk  ] info: pcmk_notify:
Enabling node  notifications for child 8311 (0xd83090)
Apr 12 13:15:18 app1a ocfs2_controld.pcmk: Error opening control device:
Unable to access cluster service



if i start "ocfs2_controld.pcmk -D" i get:
==========================================
ocfs2_controld[18489]: 2010/04/12_13:40:39 info: init_ais_connection:
Creating connection to our AIS plugin
ocfs2_controld[18489]: 2010/04/12_13:40:39 info: init_ais_connection: AIS
connection established
ocfs2_controld[18489]: 2010/04/12_13:40:39 info: get_ais_nodeid: Server
details: id=569559765 uname=app1a.xlhost.de cname=pcmk
ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node
app1a.xlhost.de now has id: 569559765
ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node
569559765 is now known as app1a.xlhost.de
1271072439 setup_stack at 168: Cluster connection established.  Local node
id: 569559765
1271072439 setup_stack at 172: Added Pacemaker as client 1 with fd 5
1271072439 setup_ckpt at 609: Initializing CKPT service (try 1)
1271072439 setup_ckpt at 615: Connected to CKPT service with handle
0x327b23c600000000
1271072439 call_ckpt_open at 160: Opening checkpoint
"ocfs2:controld:21f2cad5" (try 1)
1271072439 call_ckpt_open at 170: Opened checkpoint "ocfs2:controld:21f2cad5"
with handle 0x6633487300000000
1271072439 call_section_write at 340: Writing to section
"daemon_max_protocol" on checkpoint "ocfs2:controld:21f2cad5" (try 1)
1271072439 call_section_create at 292: Creating section "daemon_max_protocol"
on checkpoint "ocfs2:controld:21f2cad5" (try 1)
1271072439 call_section_create at 300: Created section "daemon_max_protocol"
on checkpoint "ocfs2:controld:21f2cad5"
1271072439 call_section_write at 340: Writing to section "ocfs2_max_protocol"
on checkpoint "ocfs2:controld:21f2cad5" (try 1)
1271072439 call_section_create at 292: Creating section "ocfs2_max_protocol"
on checkpoint "ocfs2:controld:21f2cad5" (try 1)
1271072439 call_section_create at 300: Created section "ocfs2_max_protocol"
on checkpoint "ocfs2:controld:21f2cad5"
1271072439 start_join at 588: Starting join for group "ocfs2:controld"
1271072439 start_join at 592: cpg_join succeeded
1271072439 loop at 975: setup done
ocfs2_controld[18489]: 2010/04/12_13:40:39 notice: ais_dispatch:
Membership 156: quorum acquired
ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_update_peer: Node
app1a.xlhost.de: id=569559765 state=member (new) addr=r(0)
ip(213.202.242.161)  (new) votes=1 (new) born=156 seen=156
proc=00000000000000000000000000013312 (new)
ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node
app1b.xlhost.de now has id: 586336981
ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node
586336981 is now known as app1b.xlhost.de
ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_update_peer: Node
app1b.xlhost.de: id=586336981 state=member (new) addr=r(0)
ip(213.202.242.162)  votes=1 born=148 seen=156
proc=00000000000000000000000000013312
1271072439 confchg_cb at 495: confchg called
1271072439 daemon_change at 398: ocfs2_controld (group "ocfs2:controld")
confchg: members 1, left 0, joined 1
1271072439 cpg_joined at 909: CPG is live, we are the first daemon
1271072439 call_ckpt_open at 160: Opening checkpoint "ocfs2:controld" (try 1)
1271072439 call_ckpt_open at 170: Opened checkpoint "ocfs2:controld" with
handle 0x2ae8944a00000001
1271072439 call_section_write at 340: Writing to section "daemon_protocol" on
checkpoint "ocfs2:controld" (try 1)
1271072439 call_section_create at 292: Creating section "daemon_protocol" on
checkpoint "ocfs2:controld" (try 1)
1271072439 call_section_create at 300: Created section "daemon_protocol" on
checkpoint "ocfs2:controld"
1271072439 call_section_write at 340: Writing to section "ocfs2_protocol" on
checkpoint "ocfs2:controld" (try 1)
1271072439 call_section_create at 292: Creating section "ocfs2_protocol" on
checkpoint "ocfs2:controld" (try 1)
1271072439 call_section_create at 300: Created section "ocfs2_protocol" on
checkpoint "ocfs2:controld"
1271072439 cpg_joined at 923: Daemon protocol is 1.0
1271072439 cpg_joined at 925: fs protocol is 1.0
1271072439 cpg_joined at 927: Connecting to dlm_controld
1271072439 cpg_joined at 934: Opening control device
1271072439 cpg_joined at 938: Error opening control device: Unable to access
cluster service
1271072439 exit_dlmcontrol at 363: Closing dlm_controld connection
1271072439 start_leave at 613: leaving group "ocfs2:controld"
1271072439 start_leave at 626: cpg_leave succeeded
1271072439 exit_cpg at 760: closing cpg connection
1271072439 call_ckpt_close at 240: Closing checkpoint
"ocfs2:controld:21f2cad5" (try 1)
1271072439 call_ckpt_close at 246: Closed checkpoint
"ocfs2:controld:21f2cad5"
1271072439 exit_ckpt at 643: Disconnecting from CKPT service (try 1)
1271072439 exit_ckpt at 647: Disconnected from CKPT service
1271072439 exit_stack at 144: closing pacemaker connection
ocfs2_controld[18489]: 2010/04/12_13:40:39 notice:
terminate_ais_connection: Disconnected from AIS


obviously ocfs2_controld.pcmk can connect to the openais CKPT service and
to dlm_controld.pcmk, which then terminates the connection.
here's the output from dlm_controld.pcmk -q 0 -D:
(the last 6 lines show 3 connection attempts from ocfs2_controld.pcmk!)
=======================================================================
1271072755 dlm_controld 3.0.10 started
cluster-dlm[20608]: 2010/04/12_13:45:55 info: init_ais_connection:
Creating connection to our AIS plugin
cluster-dlm[20608]: 2010/04/12_13:45:55 info: init_ais_connection: AIS
connection established
cluster-dlm[20608]: 2010/04/12_13:45:55 info: get_ais_nodeid: Server
details: id=569559765 uname=app1a.xlhost.de cname=pcmk
cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_new_peer: Node
app1a.xlhost.de now has id: 569559765
cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_new_peer: Node 569559765
is now known as app1a.xlhost.de
1271072755 found /dev/misc/dlm-control minor 58
1271072755 found /dev/misc/dlm-monitor minor 57
1271072755 found /dev/misc/dlm_plock minor 56
1271072755 /dev/misc/dlm-monitor fd 9
1271072755 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2
1271072755 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2
1271072755 confdb_key_get error 11
1271072755 group_mode 3 compat 0
1271072755 setup_cpg_daemon 11
1271072755 dlm:controld conf 2 1 0 memb 569559765 586336981 join 569559765
left
1271072755 run protocol from nodeid 586336981
1271072755 daemon run 1.1.1 max 1.1.1 kernel run 1.1.1 max 1.1.1
1271072755 plocks 13
1271072755 plock cpg message size: 104 bytes
cluster-dlm[20608]: 2010/04/12_13:45:55 notice: ais_dispatch: Membership
156: quorum acquired
cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_update_peer: Node
app1a.xlhost.de: id=569559765 state=member (new) addr=r(0)
ip(213.202.242.161)  (new) votes=1 (new) born=156 seen=156
proc=00000000000000000000000000013312 (new)
cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_new_peer: Node
app1b.xlhost.de now has id: 586336981
cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_new_peer: Node 586336981
is now known as app1b.xlhost.de
cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_update_peer: Node
app1b.xlhost.de: id=586336981 state=member (new) addr=r(0)
ip(213.202.242.162)  votes=1 born=148 seen=156
proc=00000000000000000000000000013312
1271072755 Processing membership 156
1271072755 Adding address ip(213.202.242.161) to configfs for node
569559765
1271072755 set_configfs_node 569559765 213.202.242.161 local 1
1271072755 Added active node 569559765: born-on=156, last-seen=156,
this-event=156, last-event=0
1271072755 Adding address ip(213.202.242.162) to configfs for node
586336981
1271072755 set_configfs_node 586336981 213.202.242.162 local 0
1271072755 Added active node 586336981: born-on=148, last-seen=156,
this-event=156, last-event=0
1271072763 client connection 5 fd 14
1271072763 connection 5 read error -1
1271072776 client connection 5 fd 14
1271072776 connection 5 read error -1
1271072779 client connection 5 fd 14
1271072779 connection 5 read error -1



i'm pretty lost at the moment, as there's nothing i can find via google
regarding the "core" problem:
1271072439 cpg_joined at 934: Opening control device
1271072439 cpg_joined at 938: Error opening control device: Unable to access
cluster service


any help would be greatly appreciated.

best regards,
jürgen herrmann
-- 
>> XLhost.de - eXperts in Linux hosting ® <<

XLhost.de GmbH
Jürgen Herrmann, Geschäftsführer
Boelckestrasse 21, 93051 Regensburg, Germany

Geschäftsführer: Volker Geith, Jürgen Herrmann
Registriert unter: HRB9918
Umsatzsteuer-Identifikationsnummer: DE245931218

Fon:  +49 (0)800 XLHOSTDE [0800 95467833]
Fax:  +49 (0)800 95467830

WEB:  http://www.XLhost.de
IRC:  #XLhost at irc.quakenet.org




More information about the Ocfs2-users mailing list