[Ocfs2-users] one node kernel panic

Hideyasu Kojima hid.kojima at ms.scsk.jp
Thu Oct 6 22:33:05 PDT 2011


Thank you for responding.

I think UEK5 is based on RHEL5 kernel.
Does the problem same as UEK5 arise?

(2011/10/05 1:45), Sunil Mushran wrote:
> int sigprocmask(int how, sigset_t *set, sigset_t *oldset)
> {
> int error;
>
> spin_lock_irq(&current->sighand->siglock); <==== CRASH
> if (oldset)
> *oldset = current->blocked;
> ...
> }
>
> current->sighand is NULL. So definitely a race. Generic kernel issue.
> Ping your kernel vendor.
>
> On 10/03/2011 07:49 PM, Hideyasu Kojima wrote:
>> Hi,
>>
>> I run ocfs2/drbd active-active 2node cluster.
>>
>> ocfs2 version is 1.4.7-1
>> ocfs2-tool version is 1.4.4
>> Linux version is RHEL 5.4 (2.6.18-164.el5 x86_64)
>>
>> 1 node crash with kernel panic once.
>>
>> What is the cause?
>>
>> The bottom is the analysis of vmcore.
>>
>> ========================================================
>>
>> Unable to handle kernel NULL pointer dereference at 0000000000000808 RIP:
>> [<ffffffff80064ae6>] _spin_lock_irq+0x1/0xb
>> PGD 187e15067 PUD 187e16067 PMD 0
>> Oops: 0002 [1] SMP
>> last sysfs file:
>> /devices/pci0000:00/0000:00:09.0/0000:06:00.0/0000:07:00.0/irq
>> CPU 1
>> Modules linked in: mptctl mptbase softdog autofs4 ipmi_devintf ipmi_si
>> ipmi_msghandler ocfs2(U) ocfs2_dlmfs(U) ocfs2_dlm(U)
>> ocfs2_nodemanager(U) configfs drbd(U) bonding ipv6 xfrm_nalgo crypto_api
>> bnx2i(U) libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi cnic(U)
>> dm_mirror dm_multipath scsi_dh video hwmon backlight sbs i2c_ec i2c_core
>> button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev
>> sr_mod cdrom sg pcspkr serio_raw hpilo bnx2(U) dm_raid45 dm_message
>> dm_region_hash dm_log dm_mod dm_mem_cache hpahcisr(PU) ata_piix libata
>> shpchp cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
>> Pid: 21924, comm: res Tainted: P 2.6.18-164.el5 #1
>> RIP: 0010:[<ffffffff80064ae6>] [<ffffffff80064ae6>]
>> _spin_lock_irq+0x1/0xb
>> RSP: 0018:ffff81008b1cfae0 EFLAGS: 00010002
>> RAX: ffff810187af4040 RBX: 0000000000000000 RCX: ffff8101342b7b80
>> RDX: ffff81008b1cfb98 RSI: ffff81008b1cfba8 RDI: 0000000000000808
>> RBP: ffff81008b1cfb98 R08: 0000000000000000 R09: 0000000000000000
>> R10: ffff810075463090 R11: ffffffff88595b95 R12: ffff81008b1cfba8
>> R13: ffff81007f070520 R14: 0000000000000001 R15: ffff81008b1cfce8
>> FS: 0000000000000000(0000) GS:ffff810105d51840(0000)
>> knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> CR2: 0000000000000808 CR3: 0000000187e14000 CR4: 00000000000006e0
>> Process res (pid: 21924, threadinfo ffff81008b1ce000, task
>> ffff810187af4040)
>> Stack: ffffffff8001db30 ffff81007f070520 ffffffff885961f3
>> ffff810105d39400
>> ffffffff88596323 06ff813231393234 ffff810075463018 ffff810075463018
>> 0000000000000297 ffff81007f070520 ffff810075463028 0000000000000246
>> Call Trace:
>> [<ffffffff8001db30>] sigprocmask+0x28/0xdb
>> [<ffffffff885961f3>] :ocfs2:ocfs2_delete_inode+0x0/0x1691
>> [<ffffffff88596323>] :ocfs2:ocfs2_delete_inode+0x130/0x1691
>> [<ffffffff88581f16>] :ocfs2:ocfs2_drop_lock+0x67a/0x77b
>> [<ffffffff8858026a>] :ocfs2:ocfs2_remove_lockres_tracking+0x10/0x45
>> [<ffffffff885961f3>] :ocfs2:ocfs2_delete_inode+0x0/0x1691
>> [<ffffffff8002f49e>] generic_delete_inode+0xc6/0x143
>> [<ffffffff88595c85>] :ocfs2:ocfs2_drop_inode+0xf0/0x161
>> [<ffffffff8000d46e>] dput+0xf6/0x114
>> [<ffffffff800e9c44>] prune_one_dentry+0x66/0x76
>> [<ffffffff8002e958>] prune_dcache+0x10f/0x149
>> [<ffffffff8004d66e>] shrink_dcache_parent+0x1c/0xe1
>> [<ffffffff80104f8b>] proc_flush_task+0x17c/0x1f6
>> [<ffffffff8008fa2c>] sched_exit+0x27/0xb5
>> [<ffffffff80018024>] release_task+0x387/0x3cb
>> [<ffffffff80015c50>] do_exit+0x865/0x911
>> [<ffffffff80049281>] cpuset_exit+0x0/0x88
>> [<ffffffff8002b080>] get_signal_to_deliver+0x42c/0x45a
>> [<ffffffff8005ae7b>] do_notify_resume+0x9c/0x7af
>> [<ffffffff8008b6a2>] deactivate_task+0x28/0x5f
>> [<ffffffff80021f3f>] __up_read+0x19/0x7f
>> [<ffffffff80066b58>] do_page_fault+0x4fe/0x830
>> [<ffffffff800b65b2>] audit_syscall_exit+0x336/0x362
>> [<ffffffff8005d32e>] int_signal+0x12/0x17
>>
>>
>> Code: f0 ff 0f 0f 88 f3 00 00 00 c3 53 48 89 fb e8 33 f5 02 00 f0
>> RIP [<ffffffff80064ae6>] _spin_lock_irq+0x1/0xb
>> RSP<ffff81008b1cfae0>
>> crash> bt
>> PID: 21924 TASK: ffff810187af4040 CPU: 1 COMMAND: "res"
>> #0 [ffff81008b1cf840] crash_kexec at ffffffff800ac5b9
>> #1 [ffff81008b1cf900] __die at ffffffff80065127
>> #2 [ffff81008b1cf940] do_page_fault at ffffffff80066da7
>> #3 [ffff81008b1cfa30] error_exit at ffffffff8005dde9
>> [exception RIP: _spin_lock_irq+1]
>> RIP: ffffffff80064ae6 RSP: ffff81008b1cfae0 RFLAGS: 00010002
>> RAX: ffff810187af4040 RBX: 0000000000000000 RCX: ffff8101342b7b80
>> RDX: ffff81008b1cfb98 RSI: ffff81008b1cfba8 RDI: 0000000000000808
>> RBP: ffff81008b1cfb98 R8: 0000000000000000 R9: 0000000000000000
>> R10: ffff810075463090 R11: ffffffff88595b95 R12: ffff81008b1cfba8
>> R13: ffff81007f070520 R14: 0000000000000001 R15: ffff81008b1cfce8
>> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
>> #4 [ffff81008b1cfae0] sigprocmask at ffffffff8001db30
>> #5 [ffff81008b1cfb00] ocfs2_delete_inode at ffffffff88596323
>> #6 [ffff81008b1cfbf0] generic_delete_inode at ffffffff8002f49e
>> #7 [ffff81008b1cfc10] ocfs2_drop_inode at ffffffff88595c85
>> #8 [ffff81008b1cfc30] dput at ffffffff8000d46e
>> #9 [ffff81008b1cfc50] prune_one_dentry at ffffffff800e9c44
>> #10 [ffff81008b1cfc70] prune_dcache at ffffffff8002e958
>> #11 [ffff81008b1cfca0] shrink_dcache_parent at ffffffff8004d66e
>> #12 [ffff81008b1cfcd0] proc_flush_task at ffffffff80104f8b
>> #13 [ffff81008b1cfd30] release_task at ffffffff80018024
>> #14 [ffff81008b1cfd60] do_exit at ffffffff80015c50
>> #15 [ffff81008b1cfdc0] get_signal_to_deliver at ffffffff8002b080
>> #16 [ffff81008b1cfe00] do_notify_resume at ffffffff8005ae7b
>> #17 [ffff81008b1cff50] int_signal at ffffffff8005d32e
>> RIP: 0000003e4becced2 RSP: 000000004124afd0 RFLAGS: 00000202
>> RAX: fffffffffffffdfe RBX: 0000000000000000 RCX: ffffffffffffffff
>> RDX: 0000000000000000 RSI: 000000004124b040 RDI: 0000000000000006
>> RBP: 000000004124b0e0 R8: 000000004124b110 R9: 00000000000055a4
>> R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
>> R13: 000000004124c000 R14: 000000004124b940 R15: 0000000000001000
>> ORIG_RAX: 0000000000000017 CS: 0033 SS: 002b
>> crash>
>
>
>


-- 
会社名・メールアドレスが変わりました

SCSK株式会社 SCSカンパニー
中部支社 営業部
小島 英靖

E-Mail: hid.kojima at ms.scsk.jp
TEL 052-951-0398 FAX 052-951-0397





More information about the Ocfs2-users mailing list