[Ocfs2-users] ocfs2/o2cb problem with openais/pacemaker

Jürgen Herrmann Juergen.Herrmann at XLhost.de
Mon Apr 12 05:53:37 PDT 2010


i'm on debian lenny and trying to run ocfs2 on a dual primary
drbd device. the drbd device is already set up as msDRBD0.

to get dlm_controld.pcmk i installed it from source (from
now i configured a resource "resDLM" with 2 clones:
  primitive resDLM ocf:pacemaker:controld op monitor interval="120s"
  clone cloneDLM resDLM meta globally-unique="false" interleave="true"
  colocation colDLM_DRBD0 inf: cloneDLM msDRBD0:Master
  order ordDRBD0_DLM inf: msDRBD0:promote cloneDLM:start
-> seems to work.

to get ocfs2_controld.pcmk i installed ocfs2-tools-1.4.3 from source.
after adding the resource:
  primitive resO2CB ocf:pacemaker:o2cb op monitor interval="120s"
  clone cloneO2CB resO2CB meta globally-unique="false" interleave="true"
  colocation colO2CB_DLM inf: cloneO2CB cloneDLM
  order ordDLM_O2CB inf: cloneDLM cloneO2CB

i get the following errors in crm_mon:
Failed actions:
    resO2CB:0_start_0 (node=app1b.xlhost.de, call=28, rc=1,
status=complete): unknown error
    resO2CB:0_start_0 (node=app1a.xlhost.de, call=38, rc=1,
status=complete): unknown error

the relevant syslog entries:
Apr 12 13:15:18 app1a corosync[4638]:   [pcmk  ] info: pcmk_notify:
Enabling node  notifications for child 8311 (0xd83090)
Apr 12 13:15:18 app1a ocfs2_controld.pcmk: Error opening control device:
Unable to access cluster service

if i start "ocfs2_controld.pcmk -D" i get:
ocfs2_controld[18489]: 2010/04/12_13:40:39 info: init_ais_connection:
Creating connection to our AIS plugin
ocfs2_controld[18489]: 2010/04/12_13:40:39 info: init_ais_connection: AIS
connection established
ocfs2_controld[18489]: 2010/04/12_13:40:39 info: get_ais_nodeid: Server
details: id=569559765 uname=app1a.xlhost.de cname=pcmk
ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node
app1a.xlhost.de now has id: 569559765
ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node
569559765 is now known as app1a.xlhost.de
1271072439 setup_stack at 168: Cluster connection established.  Local node
id: 569559765
1271072439 setup_stack at 172: Added Pacemaker as client 1 with fd 5
1271072439 setup_ckpt at 609: Initializing CKPT service (try 1)
1271072439 setup_ckpt at 615: Connected to CKPT service with handle
1271072439 call_ckpt_open at 160: Opening checkpoint
"ocfs2:controld:21f2cad5" (try 1)
1271072439 call_ckpt_open at 170: Opened checkpoint "ocfs2:controld:21f2cad5"
with handle 0x6633487300000000
1271072439 call_section_write at 340: Writing to section
"daemon_max_protocol" on checkpoint "ocfs2:controld:21f2cad5" (try 1)
1271072439 call_section_create at 292: Creating section "daemon_max_protocol"
on checkpoint "ocfs2:controld:21f2cad5" (try 1)
1271072439 call_section_create at 300: Created section "daemon_max_protocol"
on checkpoint "ocfs2:controld:21f2cad5"
1271072439 call_section_write at 340: Writing to section "ocfs2_max_protocol"
on checkpoint "ocfs2:controld:21f2cad5" (try 1)
1271072439 call_section_create at 292: Creating section "ocfs2_max_protocol"
on checkpoint "ocfs2:controld:21f2cad5" (try 1)
1271072439 call_section_create at 300: Created section "ocfs2_max_protocol"
on checkpoint "ocfs2:controld:21f2cad5"
1271072439 start_join at 588: Starting join for group "ocfs2:controld"
1271072439 start_join at 592: cpg_join succeeded
1271072439 loop at 975: setup done
ocfs2_controld[18489]: 2010/04/12_13:40:39 notice: ais_dispatch:
Membership 156: quorum acquired
ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_update_peer: Node
app1a.xlhost.de: id=569559765 state=member (new) addr=r(0)
ip(  (new) votes=1 (new) born=156 seen=156
proc=00000000000000000000000000013312 (new)
ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node
app1b.xlhost.de now has id: 586336981
ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_new_peer: Node
586336981 is now known as app1b.xlhost.de
ocfs2_controld[18489]: 2010/04/12_13:40:39 info: crm_update_peer: Node
app1b.xlhost.de: id=586336981 state=member (new) addr=r(0)
ip(  votes=1 born=148 seen=156
1271072439 confchg_cb at 495: confchg called
1271072439 daemon_change at 398: ocfs2_controld (group "ocfs2:controld")
confchg: members 1, left 0, joined 1
1271072439 cpg_joined at 909: CPG is live, we are the first daemon
1271072439 call_ckpt_open at 160: Opening checkpoint "ocfs2:controld" (try 1)
1271072439 call_ckpt_open at 170: Opened checkpoint "ocfs2:controld" with
handle 0x2ae8944a00000001
1271072439 call_section_write at 340: Writing to section "daemon_protocol" on
checkpoint "ocfs2:controld" (try 1)
1271072439 call_section_create at 292: Creating section "daemon_protocol" on
checkpoint "ocfs2:controld" (try 1)
1271072439 call_section_create at 300: Created section "daemon_protocol" on
checkpoint "ocfs2:controld"
1271072439 call_section_write at 340: Writing to section "ocfs2_protocol" on
checkpoint "ocfs2:controld" (try 1)
1271072439 call_section_create at 292: Creating section "ocfs2_protocol" on
checkpoint "ocfs2:controld" (try 1)
1271072439 call_section_create at 300: Created section "ocfs2_protocol" on
checkpoint "ocfs2:controld"
1271072439 cpg_joined at 923: Daemon protocol is 1.0
1271072439 cpg_joined at 925: fs protocol is 1.0
1271072439 cpg_joined at 927: Connecting to dlm_controld
1271072439 cpg_joined at 934: Opening control device
1271072439 cpg_joined at 938: Error opening control device: Unable to access
cluster service
1271072439 exit_dlmcontrol at 363: Closing dlm_controld connection
1271072439 start_leave at 613: leaving group "ocfs2:controld"
1271072439 start_leave at 626: cpg_leave succeeded
1271072439 exit_cpg at 760: closing cpg connection
1271072439 call_ckpt_close at 240: Closing checkpoint
"ocfs2:controld:21f2cad5" (try 1)
1271072439 call_ckpt_close at 246: Closed checkpoint
1271072439 exit_ckpt at 643: Disconnecting from CKPT service (try 1)
1271072439 exit_ckpt at 647: Disconnected from CKPT service
1271072439 exit_stack at 144: closing pacemaker connection
ocfs2_controld[18489]: 2010/04/12_13:40:39 notice:
terminate_ais_connection: Disconnected from AIS

obviously ocfs2_controld.pcmk can connect to the openais CKPT service and
to dlm_controld.pcmk, which then terminates the connection.
here's the output from dlm_controld.pcmk -q 0 -D:
(the last 6 lines show 3 connection attempts from ocfs2_controld.pcmk!)
1271072755 dlm_controld 3.0.10 started
cluster-dlm[20608]: 2010/04/12_13:45:55 info: init_ais_connection:
Creating connection to our AIS plugin
cluster-dlm[20608]: 2010/04/12_13:45:55 info: init_ais_connection: AIS
connection established
cluster-dlm[20608]: 2010/04/12_13:45:55 info: get_ais_nodeid: Server
details: id=569559765 uname=app1a.xlhost.de cname=pcmk
cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_new_peer: Node
app1a.xlhost.de now has id: 569559765
cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_new_peer: Node 569559765
is now known as app1a.xlhost.de
1271072755 found /dev/misc/dlm-control minor 58
1271072755 found /dev/misc/dlm-monitor minor 57
1271072755 found /dev/misc/dlm_plock minor 56
1271072755 /dev/misc/dlm-monitor fd 9
1271072755 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2
1271072755 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2
1271072755 confdb_key_get error 11
1271072755 group_mode 3 compat 0
1271072755 setup_cpg_daemon 11
1271072755 dlm:controld conf 2 1 0 memb 569559765 586336981 join 569559765
1271072755 run protocol from nodeid 586336981
1271072755 daemon run 1.1.1 max 1.1.1 kernel run 1.1.1 max 1.1.1
1271072755 plocks 13
1271072755 plock cpg message size: 104 bytes
cluster-dlm[20608]: 2010/04/12_13:45:55 notice: ais_dispatch: Membership
156: quorum acquired
cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_update_peer: Node
app1a.xlhost.de: id=569559765 state=member (new) addr=r(0)
ip(  (new) votes=1 (new) born=156 seen=156
proc=00000000000000000000000000013312 (new)
cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_new_peer: Node
app1b.xlhost.de now has id: 586336981
cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_new_peer: Node 586336981
is now known as app1b.xlhost.de
cluster-dlm[20608]: 2010/04/12_13:45:55 info: crm_update_peer: Node
app1b.xlhost.de: id=586336981 state=member (new) addr=r(0)
ip(  votes=1 born=148 seen=156
1271072755 Processing membership 156
1271072755 Adding address ip( to configfs for node
1271072755 set_configfs_node 569559765 local 1
1271072755 Added active node 569559765: born-on=156, last-seen=156,
this-event=156, last-event=0
1271072755 Adding address ip( to configfs for node
1271072755 set_configfs_node 586336981 local 0
1271072755 Added active node 586336981: born-on=148, last-seen=156,
this-event=156, last-event=0
1271072763 client connection 5 fd 14
1271072763 connection 5 read error -1
1271072776 client connection 5 fd 14
1271072776 connection 5 read error -1
1271072779 client connection 5 fd 14
1271072779 connection 5 read error -1

i'm pretty lost at the moment, as there's nothing i can find via google
regarding the "core" problem:
1271072439 cpg_joined at 934: Opening control device
1271072439 cpg_joined at 938: Error opening control device: Unable to access
cluster service

any help would be greatly appreciated.

best regards,
jürgen herrmann
>> XLhost.de - eXperts in Linux hosting ® <<

XLhost.de GmbH
Jürgen Herrmann, Geschäftsführer
Boelckestrasse 21, 93051 Regensburg, Germany

Geschäftsführer: Volker Geith, Jürgen Herrmann
Registriert unter: HRB9918
Umsatzsteuer-Identifikationsnummer: DE245931218

Fon:  +49 (0)800 XLHOSTDE [0800 95467833]
Fax:  +49 (0)800 95467830

WEB:  http://www.XLhost.de
IRC:  #XLhost at irc.quakenet.org

More information about the Ocfs2-users mailing list