[Ocfs2-users] ocfs or configfs bug ?

Welterlen Benoit benoit.welterlen at bull.net
Tue Apr 19 07:54:32 PDT 2011


Hi all,

I have a bug with OCFS through configfs : to illustrate this, try :

while true ; do ls -l /sys/kernel/config/cluster/ocfs2/heartbeat ; done&

while true ; do echo 31>  /sys/kernel/config/cluster/ocfs2/heartbeat/dead_threshold ; done&


So, I have a kernel crash :

BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
IP: [<ffffffffa01fd214>] configfs_readdir+0xf4/0x230 [configfs]
PGD 467bea067 PUD 46d4d9067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/fs/o2cb/interface_revision
CPU 36
Modules linked in: ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm nls_utf8 nfs lockd 
fscache nfs_acl auth_rpcgss ocfs2 ocfs2_nodemanager configfs ocfs2_stackglue 
ipmi_devintf ipmi_si ipmi_msghandler sunrpc ipt_REJECT nf_conntrack_ipv4 
nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 
xt_state nf_conntrack ip6table_filter ip6_tables x_tables ipv6 i2c_i801 i2c_core 
sg iTCO_wdt iTCO_vendor_support ioatdma i7core_edac edac_core igb dca ext4 jbd2 
sd_mod crc_t10dif usbhid hid ahci ehci_hcd uhci_hcd dm_mod [last unloaded: 
scsi_wait_scan]

Modules linked in: ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm nls_utf8 nfs lockd 
fscache nfs_acl auth_rpcgss ocfs2 ocfs2_nodemanager configfs ocfs2_stackglue 
ipmi_devintf ipmi_si ipmi_msghandler sunrpc ipt_REJECT nf_conntrack_ipv4 
nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 
xt_state nf_conntrack ip6table_filter ip6_tables x_tables ipv6 i2c_i801 i2c_core 
sg iTCO_wdt iTCO_vendor_support ioatdma i7core_edac edac_core igb dca ext4 jbd2 
sd_mod crc_t10dif usbhid hid ahci ehci_hcd uhci_hcd dm_mod [last unloaded: 
scsi_wait_scan]
Pid: 59850, comm: ls Tainted: G   M       ----------------  
2.6.32-71.24.1.el6.Bull.23.x86_64 #1 bullx super-node
RIP: 0010:[<ffffffffa01fd214>]  [<ffffffffa01fd214>] configfs_readdir+0xf4/0x230 
[configfs]
RSP: 0018:ffff880c6c8b3e78  EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88086c4b23a8 RCX: ffff88086c4b23a0
RDX: 000000000000000e RSI: ffff88086c4b2410 RDI: ffffffffa02946e1
RBP: ffff880c6c8b3ed8 R08: ffff88086c4b23a8 R09: 0000000000000004
R10: 00007fff59ce4cf0 R11: 0000000000000246 R12: ffff88046bfbe0c0
R13: ffffffffa02946e1 R14: ffff88046c687608 R15: ffff88046c687610
FS:  00007fdf806017a0(0000) GS:ffff880036840000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000040 CR3: 0000000467ffc000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ls (pid: 59850, threadinfo ffff880c6c8b2000, task ffff880c6aeeeea0)
Stack:
  ffff880c6c8b3ee8 0000000002347078 ffff88086c4b23a0 ffffffff8116bea0
<0> ffff880c6c8b3f38 ffff88086c4b2410 ffff880c6c8b3ef8 ffff88046bfbe0c0
<0> ffff880c6c8b3f38 ffffffff8116bea0 ffff88086e109720 ffff88086e109668
Call Trace:
  [<ffffffff8116bea0>] ? filldir+0x0/0xe0
  [<ffffffff8116bea0>] ? filldir+0x0/0xe0
  [<ffffffff8116c120>] vfs_readdir+0xc0/0xe0
  [<ffffffff8116c2a9>] sys_getdents+0x89/0xf0
  [<ffffffff8100c172>] system_call_fastpath+0x16/0x1b
Code: 48 83 f8 02 4d 8d 7e 08 48 89 55 c8 0f 84 15 01 00 00 49 8b 5e 08 48 3b 5d 
c8 0f 85 7c 00 00 00 e9 da 00 00 00 66 90 48 8b 40 10 <4c> 8b 40 40 44 0f b7 49 
44 4c 89 ee 49 8b 4c 24 40 48 8b 7d c0
RIP  [<ffffffffa01fd214>] configfs_readdir+0xf4/0x230 [configfs]
  RSP <ffff880c6c8b3e78>
CR2: 0000000000000040
crash> bt ffff880c6aeeeea0
PID: 59850  TASK: ffff880c6aeeeea0  CPU: 36  COMMAND: "ls"
  #0 [ffff880c6c8b3b40] machine_kexec at ffffffff8102e77b
  #1 [ffff880c6c8b3ba0] crash_kexec at ffffffff810a6cd8
  #2 [ffff880c6c8b3c70] oops_end at ffffffff8146aad0
  #3 [ffff880c6c8b3ca0] no_context at ffffffff8103789b
  #4 [ffff880c6c8b3cf0] __bad_area_nosemaphore at ffffffff81037b25
  #5 [ffff880c6c8b3d40] bad_area at ffffffff81037c4e
  #6 [ffff880c6c8b3d70] do_page_fault at ffffffff8146c648
  #7 [ffff880c6c8b3dc0] page_fault at ffffffff81469e45
     [exception RIP: configfs_readdir+244]
     RIP: ffffffffa01fd214  RSP: ffff880c6c8b3e78  RFLAGS: 00010282
     RAX: 0000000000000000  RBX: ffff88086c4b23a8  RCX: ffff88086c4b23a0
     RDX: 000000000000000e  RSI: ffff88086c4b2410  RDI: ffffffffa02946e1
     RBP: ffff880c6c8b3ed8   R8: ffff88086c4b23a8   R9: 0000000000000004
     R10: 00007fff59ce4cf0  R11: 0000000000000246  R12: ffff88046bfbe0c0
     R13: ffffffffa02946e1  R14: ffff88046c687608  R15: ffff88046c687610
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  #8 [ffff880c6c8b3ee0] vfs_readdir at ffffffff8116c120
  #9 [ffff880c6c8b3f30] sys_getdents at ffffffff8116c2a9
#10 [ffff880c6c8b3f80] system_call_fastpath at ffffffff8100c172
     RIP: 00007fdf7f8dcec5  RSP: 00007fff59ce4e70  RFLAGS: 00010202
     RAX: 000000000000004e  RBX: ffffffff8100c172  RCX: 0000000002347070
     RDX: 0000000000008000  RSI: 000000000233f078  RDI: 0000000000000003
     RBP: ffffffffffffff08   R8: 000000000233f078   R9: 0000000000800000
     R10: 00007fff59ce4cf0  R11: 0000000000000246  R12: 000000000233f010
     R13: 000000000233f078  R14: 0000000000000000  R15: 000000000233f050
     ORIG_RAX: 000000000000004e  CS: 0033  SS: 002b


I have a dump if you want more information.

I've looked into the source code, but I found that a lock is useless :
/* Only sets a new threshold if there are no active regions.
  *
  * No locking or otherwise interesting code is required for reading
  * o2hb_dead_threshold as it can't change once regions are active and
  * it's not interesting to anyone until then anyway. */
static void o2hb_dead_threshold_set(unsigned int threshold)
{
         if (threshold > O2HB_MIN_DEAD_THRESHOLD) {
                 spin_lock(&o2hb_live_lock);
                 if (list_empty(&o2hb_all_regions))
                         o2hb_dead_threshold = threshold;
                 spin_unlock(&o2hb_live_lock);
         }
}

So, is it a configfs or ocfs problem ? Who is in charge of locking the configfs 
access ?

Thanks !

Regards,

Benoit


-- 
Benoit Welterlen
Open Software R&D
Bull, Architect of an Open World TM
Tel : +33 4 76 29 73 90
http://www.bull-world.com/
www.bull.com

This e-mail contains material that is confidential for the sole use of the intended recipient. Any review, reliance or distribution by others or forwarding without express permission is strictly prohibited.
If you are not the intended recipient, please contact the sender and delete all copies.




More information about the Ocfs2-users mailing list