[Ocfs2-devel] Can we disable BUG() assert by default in ocfs2_read_blocks_sync()c

Gang He ghe at suse.com
Mon Jun 20 23:30:59 PDT 2016


Hello Guys,

Here is a BUG() assert in ocfs2_read_blocks_sync() function in buffer_head_io.c,
101 int ocfs2_read_blocks_sync(struct ocfs2_super *osb, u64 block,
102                            unsigned int nr, struct buffer_head *bhs[])
103 {
	...

140                 lock_buffer(bh);
141                 if (buffer_jbd(bh)) {
142                         mlog(ML_ERROR,
143                              "block %llu had the JBD bit set "
144                              "while I was in lock_buffer!",
145                              (unsigned long long)bh->b_blocknr);
146                         BUG();  <<== here
147                 }
148
149                 clear_buffer_uptodate(bh);

	...
}

By default, the code is enable, but the customer is encountering this BUG() kernel panic under a high-load NAS server testing case on SLES12SP1.
the panic back trace is as below,
[ 2164.268524]  sdwt: unknown partition table
[ 2164.270279]  sdwj: unknown partition table
[ 2207.030241] bio: create slab <bio-1> at 1
[ 2285.378597] OCFS2 Node Manager 1.5.0
[ 2285.455203] OCFS2 1.5.0
[ 2285.464157] JBD2: Ignoring recovery information on journal
[ 2285.468099] ocfs2: Mounting device (254,0) on (node local, slot 0) with ordered data mode.
[ 4855.333080] (nfsd,15682,5):ocfs2_read_blocks_sync:145 ERROR: block 86441055 had the JBD bit set while I was in lock_buffer!
[ 4855.333109] ------------[ cut here ]------------
[ 4855.333240] kernel BUG at /home/abuild/rpmbuild/BUILD/cluster-fs-1.0/default/fs/ocfs2/buffer_head_io.c:146!
[ 4855.333484] invalid opcode: 0000 [#1] SMP
[ 4855.333591] Modules linked in: ocfs2(O) ocfs2_stackglue(O) ocfs2_nodemanager(O) quota_tree nfsd auth_rpcgss nfs_acl lockd sunrpc dlm(O) sctp configfs tg3 libphy bnx2x mdio libcrc32c af_packet iscsi_ibft iscsi_boot_sysfs softdog msr emcpdm(POEX) emcpgpx(POEX) emcpmpx(POEX) emcp(POEX) scsi_transport_fc scsi_tgt coretemp kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel(X) aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd pcspkr dm_mod e1000e shpchp ptp pps_core button processor ext4 crc16 mbcache jbd2 sd_mod emcctd(OEN) sg scsi_mod autofs4 [last unloaded: emcpioc]
[ 4855.335085] Supported: No, Proprietary and Unsupported modules are loaded
[ 4855.335229] CPU: 5 PID: 15682 Comm: nfsd Tainted: P        W  OE NX 3.12.49-11-default #1
[ 4855.335412] task: ffff8807bef54e80 ti: ffff8807bef52000 task.ti: ffff8807bef52000
[ 4855.335581] RIP: 0010:[<ffffffffa07618b0>]  [<ffffffffa07618b0>] ocfs2_read_blocks_sync+0x360/0x370 [ocfs2]
[ 4855.335818] RSP: 0018:ffff8807bef53a70  EFLAGS: 00010282
[ 4855.335937] RAX: 000000000000006f RBX: ffff8807bef53b10 RCX: 0000000000003ce2
[ 4855.336072] RDX: 0000000000003ce2 RSI: 0000000000000206 RDI: 0000000000000206
[ 4855.336072] RBP: 000000000526fc60 R08: 000000000000006f R09: 0000000000000000
[ 4855.336072] R10: 0000000000000000 R11: ffff8807bef5380e R12: 0000000000000000
[ 4855.336072] R13: ffff8807bef53b18 R14: 1000000000000000 R15: 000000000526fc60
[ 4855.336072] FS:  0000000000000000(0000) GS:ffff88083dd40000(0000) knlGS:0000000000000000
[ 4855.336072] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 4855.336072] CR2: 00007f3a64014440 CR3: 00000007629d2000 CR4: 00000000001006e0
[ 4855.336072] Stack:
[ 4855.336072]  000000000526fc5f ffff8805ebe0ea88 ffff8807c0391000 ffff8807bef53b10
[ 4855.336072]  000000005b964280 ffff8807bef54e80 ffff8807bef55748 000000000526fc5f
[ 4855.336072]  ffff8807c0391000 ffff8807bef53b98 0000000000000000 ffff8807c0391000
[ 4855.336072] Call Trace:
[ 4855.336072]  [<ffffffffa07acc3f>] ocfs2_test_inode_bit+0x4f/0x8e0 [ocfs2]
[ 4855.336072]  [<ffffffffa07748c4>] ocfs2_get_dentry+0x224/0x4a0 [ocfs2]
[ 4855.336072]  [<ffffffffa0774bc2>] ocfs2_fh_to_dentry+0x32/0x50 [ocfs2]
[ 4855.336072]  [<ffffffff8121b453>] exportfs_decode_fh+0x63/0x2c0
[ 4855.336072]  [<ffffffffa0688425>] fh_verify+0x2f5/0x5e0 [nfsd]
[ 4855.336072]  [<ffffffffa0689b6c>] nfsd_open+0x2c/0x180 [nfsd]
[ 4855.336072]  [<ffffffffa068c694>] nfsd_write+0xb4/0x110 [nfsd]
[ 4855.336072]  [<ffffffffa069277c>] nfsd3_proc_write+0xac/0x130 [nfsd]
[ 4855.336072]  [<ffffffffa0684cc2>] nfsd_dispatch+0xb2/0x200 [nfsd]
[ 4855.336072]  [<ffffffffa0633cab>] svc_process_common+0x43b/0x680 [sunrpc]
[ 4855.336072]  [<ffffffffa0633ffc>] svc_process+0x10c/0x160 [sunrpc]
[ 4855.336072]  [<ffffffffa06846cf>] nfsd+0xaf/0x120 [nfsd]
[ 4855.336072]  [<ffffffff8107ad94>] kthread+0xb4/0xc0
[ 4855.336072]  [<ffffffff8152a518>] ret_from_fork+0x58/0x90
[ 4855.336072] Code: 00 00 48 8d b0 c8 08 00 00 48 89 3c 24 48 63 c9 41 b9 91 00 00 00 49 c7 c0 00 de 7d a0 48 c7 c7 58 66 7e a0 31 c0 e8 79 5b db e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
[ 4855.336072] RIP  [<ffffffffa07618b0>] ocfs2_read_blocks_sync+0x360/0x370 [ocfs2]
[ 4855.336072]  RSP <ffff8807bef53a70>
[ 4855.341947] ---[ end trace a7919e7f17c0a727 ]---


>From the past discussions, I think that this panic is not the first time, someone ever tried to remove this kind of BUG() in ocfs2_read_blocks_sync() and ocfs2_read_blocks(), but the code is still here.
the previous discussions, you can refer to the links,
https://oss.oracle.com/pipermail/ocfs2-users/2013-September/006115.html 
https://oss.oracle.com/bugzilla/show_bug.cgi?id=1235 

Then, How do we address this problem?
My suggestion is this bug() line can be wrapped by macro CATCH_BH_JBD_RACES, but this macro should be disable by default in Makefile.
Any comments is welcome, it is time to finish this problem.

 
Thanks
Gang






More information about the Ocfs2-devel mailing list