[Ocfs2-users] kernel BUG on OCFS2 over DRBD

Vladimir Kuklin v.kuklin at smm.ru
Thu Nov 25 08:40:55 PST 2010


  Hi

I'm experiencing the following problem while using OCFS2 over DRBD 
partition.

My config is the following:

2 servers with pacemaker+corosync stack configured

Debian Lenny/Squeeze mixed:

kernel - linux-image-2.6.32-bpo.5-amd64 (2.6.32-26~bpo50+1)
kernel modules - drbd = 8.3.7 (api:88/proto:86-91) ocfs2 = 1.5.0

packages:

pacemaker = 1.0.9.1
corosync = 1.2.1-2
dlm-pcmk = 3.0.12-2
ocfs2-tools-pacemaker(contains ocfs2_controld.pcmk binary )=1.4.4-3
ocfs2-tools = 1.4.4-3

Kernel trace follows here:


[ 3128.804789] block drbd0: Handshake successful: Agreed network 
protocol version 91
[ 3128.805094] block drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC
[ 3128.805176] block drbd0: conn( WFConnection -> WFReportParams )
[ 3128.805274] block drbd0: Starting asender thread (from drbd0_receiver 
[4776])
[ 3128.805533] block drbd0: data-integrity-alg: <not-used>
[ 3128.805626] block drbd0: drbd_sync_handshake:
[ 3128.805695] block drbd0: self 
B4F22E41814A97AB:ADC1DEC415E06ACD:0E1A98B5C70EAE0E:578A64518662F9CF 
bits:202 flags:0
[ 3128.805788] block drbd0: peer 
ADC1DEC415E06ACC:0000000000000000:0E1A98B5C70EAE0E:578A64518662F9CF 
bits:0 flags:0
[ 3128.805880] block drbd0: uuid_compare()=1 by rule 70
[ 3128.805953] block drbd0: peer( Unknown -> Secondary ) conn( 
WFReportParams -> WFBitMapS )
[ 3129.365716] block drbd0: conn( WFBitMapS -> SyncSource ) pdsk( 
Outdated -> Inconsistent )
[ 3129.365816] block drbd0: Began resync as SyncSource (will sync 808 KB 
[202 bits set]).
[ 3129.441670] block drbd0: Resync done (total 1 sec; paused 0 sec; 808 
K/sec)
[ 3129.441746] block drbd0: conn( SyncSource -> Connected ) pdsk( 
Inconsistent -> UpToDate )
[ 3154.019560] block drbd0: peer( Secondary -> Primary )
[ 3156.462341] dlm: got connection from 1191233546
[ 3162.458368] (5378,4):ocfs2_truncate_file:465 ERROR: bug expression: 
le64_to_cpu(fe->i_size) != i_size_read(inode)
[ 3162.458466] (5378,4):ocfs2_truncate_file:465 ERROR: Inode 1714687, 
inode i_size = 556 != di i_size = 604, i_flags = 0x1
[ 3162.458586] ------------[ cut here ]------------
[ 3162.458654] kernel BUG at 
/tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/ocfs2/file.c:465!
[ 3162.458745] invalid opcode: 0000 [#1] SMP
[ 3162.458901] last sysfs file: 
/sys/kernel/dlm/D9348641B1E04D0E907EFF8D978F348A/control
[ 3162.458988] CPU 4
[ 3162.459095] Modules linked in: ocfs2 jbd2 ocfs2_nodemanager 
quota_tree ocfs2_stack_user ocfs2_stackglue sha1_generic hmac drbd 
lru_cache cn dlm configfs ip_vs_rr ip_vs sctp crc32c libcrc32c nfsd 
exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc ipip tunnel4 8021q 
garp stp xt_MARK iptable_mangle xt_tcpudp iptable_filter ip_tables 
x_tables coretemp w83627hf w83793 hwmon_vid loop snd_pcsp snd_pcm_oss 
snd_mixer_oss snd_pcm radeon ttm drm_kms_helper snd_timer drm snd 
i5k_amb soundcore i2c_algo_bit container i5000_edac rng_core 
snd_page_alloc edac_core evdev button processor ioatdma dca shpchp 
pci_hotplug i2c_i801 i2c_core ext3 jbd mbcache dm_mod ses enclosure 
sd_mod crc_t10dif sg sr_mod cdrom ata_piix ata_generic libata aacraid 
ehci_hcd uhci_hcd scsi_mod thermal thermal_sys usbcore e1000e nls_base 
[last unloaded: scsi_wait_scan]
[ 3162.462354] Pid: 5378, comm: apache2 Not tainted 2.6.32-bpo.5-amd64 
#1 X7DBU
[ 3162.462354] RIP: 0010:[<ffffffffa05e006f>]  [<ffffffffa05e006f>] 
ocfs2_setattr+0x631/0x172a [ocfs2]
[ 3162.462354] RSP: 0018:ffff8801fa71bc28  EFLAGS: 00010292
[ 3162.462354] RAX: 0000000000000081 RBX: ffff8801d5afb000 RCX: 
0000000000001977
[ 3162.462354] RDX: 0000000000000000 RSI: 0000000000000092 RDI: 
0000000000000246
[ 3162.462354] RBP: 0000000000000000 R08: 000000000000f71d R09: 
000000000000000a
[ 3162.462354] R10: 0000000000000000 R11: ffffffff811b7371 R12: 
0000000000000000
[ 3162.462354] R13: ffff8801f8fc5ec8 R14: ffff8801f8fc5ec8 R15: 
ffff8801f8f752a0
[ 3162.462354] FS:  00007fb03993b710(0000) GS:ffff880008d00000(0000) 
knlGS:0000000000000000
[ 3162.462354] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3162.462354] CR2: 00000000010ccbc8 CR3: 00000001fd16e000 CR4: 
00000000000006e0
[ 3162.462354] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[ 3162.462354] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
0000000000000400
[ 3162.462354] Process apache2 (pid: 5378, threadinfo ffff8801fa71a000, 
task ffff8801f9d0b880)
[ 3162.462354] Stack:
[ 3162.462354]  000000000000022c 000000000000025c 0000000000000001 
ffff880227649000
[ 3162.462354] <0> ffff8801f8fc5b60 ffff8801fa71bd68 ffff880227649000 
0000000100000292
[ 3162.462354] <0> ffff8801fa4bc800 ffff8801f8fc5b78 000000004cee87ac 
ffff880227649000
[ 3162.462354] Call Trace:
[ 3162.462354]  [<ffffffff81051f59>] ? current_fs_time+0x1e/0x24
[ 3162.462354]  [<ffffffff81100bbb>] ? notify_change+0x180/0x2c5
[ 3162.462354]  [<ffffffff810ed880>] ? do_truncate+0x63/0x7e
[ 3162.462354]  [<ffffffff810f5a18>] ? get_write_access+0x18/0x4b
[ 3162.462354]  [<ffffffff810f7c17>] ? may_open+0x191/0x1c8
[ 3162.462354]  [<ffffffff810f84fa>] ? do_filp_open+0x4bf/0x94b
[ 3162.462354]  [<ffffffff810f1833>] ? cp_new_stat+0xe9/0xfc
[ 3162.462354]  [<ffffffff810ecb5f>] ? do_sys_open+0x55/0xfc
[ 3162.462354]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
[ 3162.462354] Code: 89 fb 62 a0 65 8b 14 25 a8 e3 00 00 89 44 24 10 48 
8b 43 20 48 63 d2 48 89 44 24 08 49 8b 46 68 48 89 04 24 31 c0 e8 0e 92 
d1 e0 <0f> 0b eb fe 49 39 cc 48 8b 05 c3 7b f8 ff 0f 86 b1 00 00 00 a9
[ 3162.462354] RIP  [<ffffffffa05e006f>] ocfs2_setattr+0x631/0x172a [ocfs2]
[ 3162.462354]  RSP <ffff8801fa71bc28>
[ 3162.469653] ---[ end trace 3a74db6ea3c5066f ]---


I don't know how to exactly reproduce this bug. Kernel doesn't stall 
after hiting this bug. But it is rather annoying and I am worried about 
file system consistency.

Any help would be appreciated.


-- 
Yours Faithfully

Vladimir Kuklin

Network Services Specialist
JSC "SMM"
51/4 build. 1, Shepkina str.
Moscow, 129110
Russia

phone +74952296363 ext. 1514
fax +74952296365
cell +79197848963

e-mail v.kuklin at smm.ru
site http://smm.ru

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20101125/7f5ad9e2/attachment.html 


More information about the Ocfs2-users mailing list