[Ocfs2-users] Problem with mdadm + lvm + drbd + ocfs ( sounds obvious, eh ? :) )

Mon Aug 6 13:04:08 PDT 2012

Hi there

First of all apologies for the lenghty message, but it's been a long weekend.
I'm trying to setup a two node cluster with the following configuration:

OS: Debian 6.0 amd64
ocfs: 1.4.4-3 ( debian package )
drbd: 8.3.7-2.1
lvm2: 2.02.66-5
kernel: 2.6.32-45
mdadm: 3.1.4-1+8efb9d1+squeeze1

layout:

0- 2 36GB scsi disks in a raid1 array , with mdadm.

1- 1 lvm2 VG above the raid1 , 4 LV from it, one of them being
'/dev/mapper/vg-lv_opt' , our target .

2- drbd on top of the lvm2 VG : configured as below:

----

resource opt {
    device         /dev/drbd0 ;
    disk            /dev/vg/lv_opt ;
    meta-disk   internal ;

    net {
        allow-two-primaries ;
    }

    startup {
        become-primary-on both ;
        wfc-timeout 120 ;
        outdated-wfc-timeout 120 ;
        degr-wfc-timeout 120 ;
    }

    syncer {
        rate 100M ;
    }

    disk {
        fencing resource-and-stonith ;
    }

    on admin1-drbd admin1 {
        address     192.168.251.11:7789 ;
    }

    on admin2-drbd admin2 {
        address     192.168.251.12:7789 ;
    }

}

----

3- ocfs2 on top of drbd , configured as below:

----

node:
        name = admin1
        cluster = opt
        number = 1
        ip_address = 192.168.251.11
        ip_port = 7777

node:
        name = admin2
        cluster = opt
        number = 2
        ip_address = 192.168.251.12
        ip_port = 7777

cluster:
        name = opt
        node_count = 2

----

- the 192.168.251 network is a cross-cable from one machine's network
card to the other.

Everything went ok until enabling ocfs2 on the nodes. then messages
like this began to appear on syslog:

----

Aug  5 14:40:11 admin1 kernel: [266728.868283] OCFS2: ERROR (device
drbd0): ocfs2_validate_inode_block: Invalid dinode #922073:
OCFS2_VALID_FL not set
Aug  5 14:40:11 admin1 kernel: [266728.868443] File system is now
read-only due to the potential of on-disk corruption. Please run
fsck.ocfs2 once the file system is unmounted.
Aug  5 14:40:11 admin1 kernel: [266728.868571]
(7023,1):ocfs2_read_locked_inode:496 ERROR: status = -22

----

I did run fsck.ocfs2 as told, fsck said "you're clean!" but the error
continue to appear.
I 've found mentions of problems with lvm2, like this:

http://www.mail-archive.com/ocfs2-users@oss.oracle.com/msg03227.html

- Does this problem manifests even without direct "contact" with lvm2
( drbd above lvm and below ocfs2 ) ?

- Is there anything obviously wrong with my configuration ?

Thanks for your time,
Fabricio