[Ocfs2-users] ocfs or configfs bug ?
Welterlen Benoit
benoit.welterlen at bull.net
Tue Apr 19 07:54:32 PDT 2011
Hi all,
I have a bug with OCFS through configfs : to illustrate this, try :
while true ; do ls -l /sys/kernel/config/cluster/ocfs2/heartbeat ; done&
while true ; do echo 31> /sys/kernel/config/cluster/ocfs2/heartbeat/dead_threshold ; done&
So, I have a kernel crash :
BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
IP: [<ffffffffa01fd214>] configfs_readdir+0xf4/0x230 [configfs]
PGD 467bea067 PUD 46d4d9067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/fs/o2cb/interface_revision
CPU 36
Modules linked in: ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm nls_utf8 nfs lockd
fscache nfs_acl auth_rpcgss ocfs2 ocfs2_nodemanager configfs ocfs2_stackglue
ipmi_devintf ipmi_si ipmi_msghandler sunrpc ipt_REJECT nf_conntrack_ipv4
nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT xt_tcpudp nf_conntrack_ipv6
xt_state nf_conntrack ip6table_filter ip6_tables x_tables ipv6 i2c_i801 i2c_core
sg iTCO_wdt iTCO_vendor_support ioatdma i7core_edac edac_core igb dca ext4 jbd2
sd_mod crc_t10dif usbhid hid ahci ehci_hcd uhci_hcd dm_mod [last unloaded:
scsi_wait_scan]
Modules linked in: ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm nls_utf8 nfs lockd
fscache nfs_acl auth_rpcgss ocfs2 ocfs2_nodemanager configfs ocfs2_stackglue
ipmi_devintf ipmi_si ipmi_msghandler sunrpc ipt_REJECT nf_conntrack_ipv4
nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT xt_tcpudp nf_conntrack_ipv6
xt_state nf_conntrack ip6table_filter ip6_tables x_tables ipv6 i2c_i801 i2c_core
sg iTCO_wdt iTCO_vendor_support ioatdma i7core_edac edac_core igb dca ext4 jbd2
sd_mod crc_t10dif usbhid hid ahci ehci_hcd uhci_hcd dm_mod [last unloaded:
scsi_wait_scan]
Pid: 59850, comm: ls Tainted: G M ----------------
2.6.32-71.24.1.el6.Bull.23.x86_64 #1 bullx super-node
RIP: 0010:[<ffffffffa01fd214>] [<ffffffffa01fd214>] configfs_readdir+0xf4/0x230
[configfs]
RSP: 0018:ffff880c6c8b3e78 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88086c4b23a8 RCX: ffff88086c4b23a0
RDX: 000000000000000e RSI: ffff88086c4b2410 RDI: ffffffffa02946e1
RBP: ffff880c6c8b3ed8 R08: ffff88086c4b23a8 R09: 0000000000000004
R10: 00007fff59ce4cf0 R11: 0000000000000246 R12: ffff88046bfbe0c0
R13: ffffffffa02946e1 R14: ffff88046c687608 R15: ffff88046c687610
FS: 00007fdf806017a0(0000) GS:ffff880036840000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000040 CR3: 0000000467ffc000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ls (pid: 59850, threadinfo ffff880c6c8b2000, task ffff880c6aeeeea0)
Stack:
ffff880c6c8b3ee8 0000000002347078 ffff88086c4b23a0 ffffffff8116bea0
<0> ffff880c6c8b3f38 ffff88086c4b2410 ffff880c6c8b3ef8 ffff88046bfbe0c0
<0> ffff880c6c8b3f38 ffffffff8116bea0 ffff88086e109720 ffff88086e109668
Call Trace:
[<ffffffff8116bea0>] ? filldir+0x0/0xe0
[<ffffffff8116bea0>] ? filldir+0x0/0xe0
[<ffffffff8116c120>] vfs_readdir+0xc0/0xe0
[<ffffffff8116c2a9>] sys_getdents+0x89/0xf0
[<ffffffff8100c172>] system_call_fastpath+0x16/0x1b
Code: 48 83 f8 02 4d 8d 7e 08 48 89 55 c8 0f 84 15 01 00 00 49 8b 5e 08 48 3b 5d
c8 0f 85 7c 00 00 00 e9 da 00 00 00 66 90 48 8b 40 10 <4c> 8b 40 40 44 0f b7 49
44 4c 89 ee 49 8b 4c 24 40 48 8b 7d c0
RIP [<ffffffffa01fd214>] configfs_readdir+0xf4/0x230 [configfs]
RSP <ffff880c6c8b3e78>
CR2: 0000000000000040
crash> bt ffff880c6aeeeea0
PID: 59850 TASK: ffff880c6aeeeea0 CPU: 36 COMMAND: "ls"
#0 [ffff880c6c8b3b40] machine_kexec at ffffffff8102e77b
#1 [ffff880c6c8b3ba0] crash_kexec at ffffffff810a6cd8
#2 [ffff880c6c8b3c70] oops_end at ffffffff8146aad0
#3 [ffff880c6c8b3ca0] no_context at ffffffff8103789b
#4 [ffff880c6c8b3cf0] __bad_area_nosemaphore at ffffffff81037b25
#5 [ffff880c6c8b3d40] bad_area at ffffffff81037c4e
#6 [ffff880c6c8b3d70] do_page_fault at ffffffff8146c648
#7 [ffff880c6c8b3dc0] page_fault at ffffffff81469e45
[exception RIP: configfs_readdir+244]
RIP: ffffffffa01fd214 RSP: ffff880c6c8b3e78 RFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88086c4b23a8 RCX: ffff88086c4b23a0
RDX: 000000000000000e RSI: ffff88086c4b2410 RDI: ffffffffa02946e1
RBP: ffff880c6c8b3ed8 R8: ffff88086c4b23a8 R9: 0000000000000004
R10: 00007fff59ce4cf0 R11: 0000000000000246 R12: ffff88046bfbe0c0
R13: ffffffffa02946e1 R14: ffff88046c687608 R15: ffff88046c687610
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ffff880c6c8b3ee0] vfs_readdir at ffffffff8116c120
#9 [ffff880c6c8b3f30] sys_getdents at ffffffff8116c2a9
#10 [ffff880c6c8b3f80] system_call_fastpath at ffffffff8100c172
RIP: 00007fdf7f8dcec5 RSP: 00007fff59ce4e70 RFLAGS: 00010202
RAX: 000000000000004e RBX: ffffffff8100c172 RCX: 0000000002347070
RDX: 0000000000008000 RSI: 000000000233f078 RDI: 0000000000000003
RBP: ffffffffffffff08 R8: 000000000233f078 R9: 0000000000800000
R10: 00007fff59ce4cf0 R11: 0000000000000246 R12: 000000000233f010
R13: 000000000233f078 R14: 0000000000000000 R15: 000000000233f050
ORIG_RAX: 000000000000004e CS: 0033 SS: 002b
I have a dump if you want more information.
I've looked into the source code, but I found that a lock is useless :
/* Only sets a new threshold if there are no active regions.
*
* No locking or otherwise interesting code is required for reading
* o2hb_dead_threshold as it can't change once regions are active and
* it's not interesting to anyone until then anyway. */
static void o2hb_dead_threshold_set(unsigned int threshold)
{
if (threshold > O2HB_MIN_DEAD_THRESHOLD) {
spin_lock(&o2hb_live_lock);
if (list_empty(&o2hb_all_regions))
o2hb_dead_threshold = threshold;
spin_unlock(&o2hb_live_lock);
}
}
So, is it a configfs or ocfs problem ? Who is in charge of locking the configfs
access ?
Thanks !
Regards,
Benoit
--
Benoit Welterlen
Open Software R&D
Bull, Architect of an Open World TM
Tel : +33 4 76 29 73 90
http://www.bull-world.com/
www.bull.com
This e-mail contains material that is confidential for the sole use of the intended recipient. Any review, reliance or distribution by others or forwarding without express permission is strictly prohibited.
If you are not the intended recipient, please contact the sender and delete all copies.
More information about the Ocfs2-users
mailing list