[Ocfs2-users] Unstable Cluster

Tony Rios tony at tonyrios.com
Fri Dec 9 00:31:44 PST 2011


Having major OCFS2 blues here...

Still having issues maintaining a stable cluster.  I've tried isolating the issue by getting an entirely dedicated ethernet switch for the OCFS2 cluster.
I've tried shutting off all machines, and slowly bringing them back online.  This sort of works.  So far I have been able to get 2 nodes back online, but I don't know how long until they mysteriously reboot again for no reason.

What I'm currently up against is an Unknown code B o while mounting /dev/sdc on /raid2005.  Check 'dmesg' for more information on this error.

I check the syslog and I get  [(mount.ocfs2,1388,0):dlm_join_domain:1857 Timed out joining dlm domain ]

So I'm stuck because this particular server which won't mount is a key player in the cluster.

Any suggestions would be greatly appreciated.

Tony

===
[   16.390699] Loading iSCSI transport class v2.0-870.
[   16.476589] iscsi: registered transport (tcp)
[   16.534914] OCFS2 Node Manager 1.5.0
[   16.562053] FS-Cache: Netfs 'nfs' registered for caching
[   16.596508] OCFS2 DLM 1.5.0
[   16.623089] Installing knfsd (copyright (C) 1996 okir at monad.swb.de).
[   16.654034] ocfs2: Registered cluster interface o2cb
[   16.721418] OCFS2 DLMFS 1.5.0
[   16.722765] OCFS2 User DLM kernel interface loaded
[   16.743879] iscsi: registered transport (iser)
[   16.753559] iscsid (715): /proc/715/oom_adj is deprecated, please use /proc/715/oom_score_adj instead.
[   17.867235] tg3 0000:04:00.0: eth0: Link is up at 1000 Mbps, full duplex
[   17.889970] tg3 0000:04:00.0: eth0: Flow control is off for TX and off for RX
[   17.912306] console [netcon0] enabled
[   17.913397] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   17.956188] netconsole: network logging started
[   18.610037] floppy0: no floppy controllers found
[   18.658230] tg3 0000:05:00.0: eth1: Link is up at 1000 Mbps, full duplex
[   18.658998] tg3 0000:05:00.0: eth1: Flow control is off for TX and off for RX
[   18.661089] ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[   28.100007] eth0: no IPv6 routers present
[   28.890006] eth1: no IPv6 routers present
[   51.003765] scsi4 : iSCSI Initiator over TCP/IP
[   51.516361] scsi 4:0:0:0: Direct-Access     IFT      DS S16E-G2240    386C PQ: 0 ANSI: 5
[   51.517630] sd 4:0:0:0: Attached scsi generic sg2 type 0
[   51.518447] sd 4:0:0:0: [sdc] 70315401216 512-byte logical blocks: (36.0 TB/32.7 TiB)
[   51.543439] scsi 4:0:0:1: Enclosure         IFT      DS S16E-G2240    386C PQ: 0 ANSI: 4
[   51.565949] scsi 4:0:0:1: Attached scsi generic sg3 type 13
[   51.566039] sd 4:0:0:0: [sdc] Write Protect is off
[   51.566045] sd 4:0:0:0: [sdc] Mode Sense: 83 00 00 08
[   51.566566] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   51.574645]  sdc: unknown partition table
[   51.700297] sd 4:0:0:0: [sdc] Attached SCSI disk
[   51.727783] ses 4:0:0:1: Attached Enclosure device
[  113.902499] o2net: accepted connection from node pedge38 (num 4) at 10.88.0.38:7777
[  117.812160] OCFS2 1.5.0
[  127.520176] (mount.ocfs2,1388,0):dlm_join_domain:1857 Timed out joining dlm domain A3AA504BE42E4D3D8A15248D8FCD49BB after 94000 msecs
[  127.543603] ocfs2: Unmounting device (8,32) on (node 0)
[  127.780023] o2net: no longer connected to node pedge38 (num 4) at 10.88.0.38:777

===


More information about the Ocfs2-users mailing list