[Ocfs2-users] OCFS2 Error in the filesystem after of some weeks running ocfs2

Eduardo Diaz - Gmail ediazrod at gmail.com
Thu Feb 9 02:08:22 PST 2012


Hi to all, I am running a very simple configuration of drbd primary
primary.. I make all test some weeks ago and all runs very well,
(shudown the nodes, etc etc etc)..

I will repeat the probes yesterday and now :(...

I don't know what happens, again!!! but every time that I stop one
node (shutdown, not poweroff) the cluster is broken :-(...

I shutdown the filesystem an make a fsck.ocfs2 and there is many
errors y cluster file but there is no way to test that the ocfs2 are
ok? I can stop in night but for me this are crazy, because every to
months the filesystem are broken and if I stop one node the running
node go down...

I have all system in debian squezee with ocfs2 1.6.3

Any Ideas??

 Feb  7 13:58:33 servidoradantra2 kernel: [1864496.744051] block
drbd0: conn( Unconnected -> WFConnection )
Feb  7 13:59:24 servidoradantra2 kernel: [1864547.064015] o2net:
connection to node servidoradantra1 (num 0) at 192.168.2.1:7777 has
been idle for 60.0 seconds, shutting it down.
Feb  7 13:59:24 servidoradantra2 kernel: [1864547.064025]
(0,0):o2net_idle_timer:1495 here are some times that might help debug
the situation: (tmr 1328619504.71832 now 1328619564.71605 dr
1328619504.71815 adv 1328619504.71839:1328619504.71840 func
(18797194:507) 1328619488.80748:1328619488.80749)
Feb  7 13:59:24 servidoradantra2 kernel: [1864547.064048] o2net: no
longer connected to node servidoradantra1 (num 0) at 192.168.2.1:7777
Feb  7 13:59:31 servidoradantra2 kernel: [1864554.860190]
(2950,0):o2dlm_eviction_cb:269 o2dlm has evicted node 0 from group
F0E244E5687046DBAAF6A928CCDEEEF1
Feb  7 13:59:31 servidoradantra2 kernel: [1864554.874012]
(28219,0):dlm_get_lock_resource:839
F0E244E5687046DBAAF6A928CCDEEEF1:M00000000000000000000120766ee68: at
least one node (0) to recover before lock mastery can begin
Feb  7 13:59:32 servidoradantra2 kernel: [1864555.876011]
(28219,0):dlm_get_lock_resource:893
F0E244E5687046DBAAF6A928CCDEEEF1:M00000000000000000000120766ee68: at
least one node (0) to recover before lock mastery can begin
Feb  7 13:59:35 servidoradantra2 kernel: [1864558.309527]
(3132,3):dlm_get_lock_resource:839
F0E244E5687046DBAAF6A928CCDEEEF1:$RECOVERY: at least one node (0) to
recover before lock mastery can begin
Feb  7 13:59:35 servidoradantra2 kernel: [1864558.309533]
(3132,3):dlm_get_lock_resource:873 F0E244E5687046DBAAF6A928CCDEEEF1:
recovery map is not empty, but must master $RECOVERY lock now
Feb  7 13:59:35 servidoradantra2 kernel: [1864558.309549]
(3132,3):dlm_do_recovery:523 (3132) Node 1 is the Recovery Master for
the Dead Node 0 for Domain F0E244E5687046DBAAF6A928CCDEEEF1
Feb  7 13:59:43 servidoradantra2 kernel: [1864566.880235]
(28219,0):ocfs2_replay_journal:1607 Recovering node 0 from slot 0 on
device (147,0)
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.884880]
------------[ cut here ]------------
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.884902] kernel BUG
at /build/buildd-linux-2.6_2.6.32-39squeeze1-i386-F5tMlP/linux-2.6-2.6.32/debian/build/source_i386_none/fs/ocfs2/journal.c:1702!
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.884938] invalid
opcode: 0000 [#1] SMP
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.884960] last sysfs
file: /sys/devices/pci0000:00/0000:00:1f.2/host5/target5:0:0/5:0:0:0/model
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.884991] Modules
linked in: ocfs2 jbd2 quota_tree crc32c drbd lru_cache cn pci_stub
vboxpci vboxnetadp vboxnetflt vboxdrv cls_u32 sch_htb sch_ingress
sch_sfq xt_time xt_connlimit xt_realm iptable_raw xt_TPROXY
nf_tproxy_core xt_hashlimit xt_comment xt_owner xt_recent xt_iprange
xt_policy xt_multiport ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP
ipt_MASQUERADE ipt_LOG ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah
ipt_addrtype xt_tcpmss xt_pkttype xt_physdev xt_NFQUEUE xt_MARK
xt_mark xt_mac xt_limit xt_length xt_helper xt_dccp xt_conntrack
xt_CONNMARK xt_connmark xt_CLASSIFY xt_tcpudp xt_state iptable_nat
nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle
nfnetlink iptable_filter ip_tables x_tables ocfs2_dlmfs
ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs
xfs exportfs it87 hwmon_vid coretemp loop firewire_sbp2
snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep nouveau
ttm drm_kms_helper snd_pcm drm snd_timer snd soundcore i2c_i801 i2c_
Feb  7 13:59:47 servidoradantra2 kernel: algo_bit parport_pc i2c_core
snd_page_alloc parport psmouse evdev button pcspkr serio_raw processor
ext3 jbd mbcache dm_mod sg usbhid hid sr_mod cdrom ata_generic sd_mod
crc_t10dif uhci_hcd pata_jmicron firewire_ohci thermal ahci
firewire_core floppy crc_itu_t libata r8169 mii ehci_hcd scsi_mod
thermal_sys sky2 usbcore nls_base [last unloaded: scsi_wait_scan]
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.886462]
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.886477] Pid: 28219,
comm: ocfs2rec Not tainted (2.6.32-5-686-bigmem #1) 965P-DS4
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.886505] EIP:
0060:[<fd01d47a>] EFLAGS: 00010246 CPU: 0
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.886532] EIP is at
__ocfs2_recovery_thread+0x3af/0x146d [ocfs2]
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.886550] EAX:
00000001 EBX: f5da6800 ECX: 00000001 EDX: 00000001
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.886569] ESI:
00000001 EDI: f6ade038 EBP: 00000000 ESP: e0cb9ed4
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.886587]  DS: 007b
ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.886605] Process
ocfs2rec (pid: 28219, ti=e0cb8000 task=c91c0440 task.ti=e0cb8000)
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.886633] Stack:
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.886647]  c91c0440
c91c0440 f5da689c 00000001 00000001 f5da6800 f6ade038 f6b21930
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.886682] <0> 00000002
00010000 00000000 00010000 00000000 e6f91000 d2baa848 00000000
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.886731] <0> 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.886790] Call Trace:
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.886814]
[<fd01d0cb>] ? __ocfs2_recovery_thread+0x0/0x146d [ocfs2]
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.886835]
[<c104a420>] ? kthread+0x61/0x66
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.886853]
[<c104a3bf>] ? kthread+0x0/0x66
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.886871]
[<c1008d87>] ? kernel_thread_helper+0x7/0x10
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.886888] Code: 00 00
68 24 b7 05 fd 50 ff b2 2c 01 00 00 68 c9 47 06 fd e8 99 10 26 c4 83
c4 20 8b 5c 24 14 8b 44 24 0c 39 83 bc 00 00 00 75 04 <0f> 0b eb fe 8d
84 24 d0 00 00 00 c7 84 24 d0 00 00 00 00 00 00
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.887102] EIP:
[<fd01d47a>] __ocfs2_recovery_thread+0x3af/0x146d [ocfs2] SS:ESP
0068:e0cb9ed4
Feb  7 13:59:47 servidoradantra2 kernel: [1864570.887413] ---[ end
trace 22961f2e1f624b7d ]---
Feb  7 14:07:19 servidoradantra2 kernel: imklog 4.6.4, log source =
/proc/kmsg started.



More information about the Ocfs2-users mailing list