[Ocfs2-devel] vfs-scale, nd->inode after __do_follow_link()

Nick Piggin npiggin at gmail.com
Sat Jan 15 10:16:40 PST 2011


On Sat, Jan 15, 2011 at 12:20 PM, J.H. <warthog9 at kernel.org> wrote:
> Nick,
>
> Just thought I'd let you know - with, or without, the vfs-scale code
> that you've got I'm getting this:
>
> [  472.666054] ------------[ cut here ]------------
> [  472.670724] kernel BUG at fs/dcache.c:1358!
> [  472.674944] invalid opcode: 0000 [#1] SMP
> [  472.679112] last sysfs file:
> /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map
> [  472.687105] last /proc..net open:  /proc/7687/net/route
> [  472.695829] last /proc..net close: /proc/7687/net/route
> [  472.704490] CPU 0
> [  472.706361] Modules linked in: ocfs2 mptctl mptbase ipmi_devintf drbd
> lru_cache nfsd lockd nfs_acl auth_rpcgss sunrpc ocfs2_dlmfs
> ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs
> 8021q garp stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
> ip6table_filter ip6_tables ipv6 xfs exportfs serio_raw iTCO_wdt bnx2
> microcode hpwdt iTCO_vendor_support ipmi_si power_meter ipmi_msghandler
> pcspkr hpilo i7core_edac edac_core shpchp hpsa radeon ttm drm_kms_helper
> drm i2c_algo_bit i2c_core [last unloaded: speedstep_lib]
> [  472.776643]
> [  472.782039] Pid: 2716, comm: httpd Tainted: G        W   2.6.37+ #4
> /ProLiant DL380 G6
> [  472.793922] RIP: 0010:[<ffffffff8113ed85>]  [<ffffffff8113ed85>]
> d_set_d_op+0x13/0x5e
> [  472.793931] RSP: 0018:ffff8807d4f87c08  EFLAGS: 00010282
> [  472.793933] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> 0000000000000000
> [  472.793936] RDX: 0000000000000246 RSI: ffffffffa04dc880 RDI:
> ffff8803cf7bcbd0
> [  472.793939] RBP: ffff8807d4f87c08 R08: ffffffffa0491049 R09:
> 0000000000000001
> [  472.793942] R10: ffff8803fc70c778 R11: ffff880700000000 R12:
> ffff8803cf737000
> [  472.793945] R13: ffff8803cb822120 R14: ffff8803cb821460 R15:
> ffff8803cf7bcbd0
> [  472.793949] FS:  00002b86adeee660(0000) GS:ffff8800dd400000(0000)
> knlGS:0000000000000000
> [  472.793952] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  472.793955] CR2: 00002b86ad2c4888 CR3: 00000007d4f75000 CR4:
> 00000000000006f0
> [  472.793958] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [  472.793961] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [  472.793964] Process httpd (pid: 2716, threadinfo ffff8807d4f86000,
> task ffff8807d4e723b0)
> [  472.793967] Stack:
> [  472.793968]  ffff8807d4f87d28 ffffffffa04aac62 ffff8803ddabfe30
> ffff8803cb820e78
> [  472.793973]  ffff8803fbc38000 ffff8803cb820ee8 00000000a0491177
> ffff8803f36cd000
> [  472.793978]  00008000d4f87c98 ffff8803fbc38000 ffff8803cf727d90
> 0000000000000000
> [  472.793983] Call Trace:
> [  472.794010]  [<ffffffffa04aac62>] ocfs2_mknod+0xb0f/0xd3e [ocfs2]
> [  472.794032]  [<ffffffffa04aaeb9>] ocfs2_create+0x13/0x15 [ocfs2]
> [  472.794036]  [<ffffffff811392b7>] vfs_create+0x70/0x92
> [  472.794041]  [<ffffffff81139fdc>] do_last+0x163/0x2e0
> [  472.794045]  [<ffffffff8113a460>] do_filp_open+0x307/0x6f1
> [  472.794050]  [<ffffffff81145394>] ? alloc_fd+0x3b/0x193
> [  472.794055]  [<ffffffff81082e33>] ? lock_release+0x19a/0x1a6
> [  472.794059]  [<ffffffff811454da>] ? alloc_fd+0x181/0x193
> [  472.794063]  [<ffffffff8112d1f6>] do_sys_open+0x60/0xf2
> [  472.794068]  [<ffffffff814a7aef>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [  472.794072]  [<ffffffff8112d2a8>] sys_open+0x20/0x22
> [  472.794077]  [<ffffffff8100ac42>] system_call_fastpath+0x16/0x1b
> [  472.794079] Code: a9 ff 03 00 00 74 08 81 0b 80 00 00 00 eb 06 81 23
> 7f ff ff ff 5b c9 c3 55 48 89 e5 0f 1f 44 00 00 48 83 bf a8 00 00 00 00
> 74 02 <0f> 0b 8b 07 f6 c4 f0 74 02 0f 0b 48 85 f6 48 89 b7 a8 00 00 00
> [  472.794112] RIP  [<ffffffff8113ed85>] d_set_d_op+0x13/0x5e
> [  472.794116]  RSP <ffff8807d4f87c08>
> [  472.794387] ---[ end trace 04b2ab2cb7dc3150 ]---
>
> I only mention this as the ocfs2 folks suggested running your code might
> solve that problem.  That said I'm going to punt this back over to the
> ocfs2 folks for further review, as the bug makes ocfs2 completely
> unusable on 2.6.37+

Oh this is the d_set_d_op thing again. Linus has changed that to a
WARN_ON_ONCE upstream now rather than BUG_ON now (which
in hindsight is how it should have first looked). So that will get you
going again. Thanks for testing and reporting it.

The underlying problem is not a new one, but race is very slim, so
a warning rather than BUG is appropriate.



More information about the Ocfs2-devel mailing list