[Ocfs2-users] kernel BUG at fs/dlm/lowcomms.c:647!

Mon Nov 22 07:31:30 PST 2010

Hi,

I have a new question ;-)
Our cluster is now working with sctp and pcmk stack.
When a node is lost or the eth is down, pcmk is fencing the other node 
and a recovery is made and all pending operations end correctly.

Now, we try to have a double ethernet.
When one link is down, pacemaker does'nt fence the other node, what's 
expected. But all pending operations on OCFS directories are freezed :

[root at chili0 ~]# cat /proc/25695/stack
[<ffffffffa0425f65>] ocfs2_wait_for_recovery+0x85/0xd0 [ocfs2]
[<ffffffffa041883c>] ocfs2_inode_lock_full_nested+0x19c/0x920 [ocfs2]
[<ffffffffa0422566>] ocfs2_inode_revalidate+0x56/0x110 [ocfs2]
[<ffffffffa041d70b>] ocfs2_getattr+0x3b/0x100 [ocfs2]
[<ffffffff8115cac1>] vfs_getattr+0x51/0x80
[<ffffffff8115cb50>] vfs_fstatat+0x60/0x80
[<ffffffff8115cbde>] vfs_lstat+0x1e/0x20
[<ffffffff8115cc04>] sys_newlstat+0x24/0x50
[<ffffffff8100c172>] system_call_fastpath+0x16/0x1b

It's the same if we shutdown eth0 or eth1, so I think OCFS is using both 
interfaces, but can't manage a link loss.
Maybe pcmk should proceed like juste after a fence and ask for a 
recovery to OCFS.
Should OCFS work with 2 links and manage the loss of one ?

Thanks

Regards,

Benoit

Le 08/11/2010 17:06, Welterlen Benoit a écrit :
> Hi,
>
> Thanks for your answer.
> I tried to force TCP instead of SCTP.
> There is no crash anymore, but OCFS is unavailable when I stop one node :
> The OCFS directory is not accessible :
>
> ls /BCM/conf
> [root at chili0 ~]# cat /proc/23921/stack
> [<ffffffffa02c5d53>] ocfs2_wait_for_recovery+0x77/0x8f [ocfs2]
> [<ffffffffa02b08a8>] ocfs2_inode_lock_full_nested+0x160/0xb8d [ocfs2]
> [<ffffffffa02c35e2>] ocfs2_inode_revalidate+0x163/0x25c [ocfs2]
> [<ffffffffa02bd9f4>] ocfs2_getattr+0x8b/0x19c [ocfs2]
> [<ffffffff8111c30f>] vfs_getattr+0x4c/0x69
> [<ffffffff8111c37c>] vfs_fstatat+0x50/0x67
> [<ffffffff8111c479>] vfs_stat+0x1b/0x1d
> [<ffffffff8111c49a>] sys_newstat+0x1f/0x39
> [<ffffffff81011db2>] system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> Any access to the filesystem hangs.
>
> Regards,
>
> Benoit
>
> Le 21/10/2010 19:13, David Teigland a écrit :
>    
>> On Thu, Oct 21, 2010 at 06:11:36PM +0200, Welterlen Benoit wrote:
>>
>>      
>>> Hi,
>>>
>>> Thanks for your answer.
>>> sctp is automatically selected to be used ("dlm: Using SCTP for
>>> communications") and I have no option to modify that
>>> (/sys/kernel/config/dlm/cluster/protocol is set to 1 and is only
>>> available once the service is started, I can modify it at this time
>>> ?)
>>>
>>>        
>> Yes, there's a dlm_controld command line option (or cluster.conf which I
>> don't suppose you're using with pacemaker.)  You can set that to get TCP,
>> but that obviates corosync redundant ring also (which is what is used to
>> auto-select SCTP).
>>
>>
>>      
>>> dlm: closing connection to node 33554699
>>>
>>>
>>>        
>>>> Here, OCFS is down on M2, then restart :
>>>>
>>>>          
>>> dlm: 77410678764B4782BDAE3E888E0C8C4D: recover 3
>>> dlm: 77410678764B4782BDAE3E888E0C8C4D: add member 33554699
>>> cm->sockaddr_storage  ffff8804796268a0
>>> dlm_nodeid_to_addr -EEXIST;
>>> dlm: no address for nodeid 33554699
>>>
>>>        
>> I suspect the pacemaker version of dlm_controld is doing an unusual
>> sequence of node addition/removal.  The stuff above eventually leads to
>> the oops below:
>>
>>
>>      
>>> sctp_packet_config: packet:ffff88107cf36598 vtag:0x2b186924
>>> sctp_packet_config: packet:ffff880c8e6037d0 vtag:0x2b186924
>>> sctp_packet_config: packet:ffff88107cf36598 vtag:0x2b186924
>>> process_sctp_notification SCTP_RESTART
>>> get_comm  nodeid : 0, sockaddr_storage : ffff88047a357c54
>>> get_comm cm->addr_count : 0000000000000001, cm->addr[0] : ffff880479626740
>>> addr : ffff88047a357c54
>>> dlm: reject connect from unknown addr
>>> 02 00 00 00 0b 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>> 00 00 00 00 00 00 00 00 00 00 00 00 00
>>> in __nodeid2con , nodeid : 0
>>> sctp_packet_config: packet:ffff88107cf36598 vtag:0x2b186924
>>> ------------[ cut here ]------------
>>> kernel BUG at fs/dlm/lowcomms.c:661!
>>> invalid opcode: 0000 [#1] SMP
>>> last sysfs file: /sys/kernel/dlm/77410678764B4782BDAE3E888E0C8C4D/control
>>> CPU 35
>>> Modules linked in: sctp(U) libcrc32c(U) ocfs2(U)
>>> ocfs2_nodemanager(U) ocfs2_stack_user(U) ocfs2_stackglue(U) dlm(U)
>>> configfs(U) acpi_cpufreq(U) freq_table(U) ipmi_devintf(U) ipmi_si(U)
>>> ipmi_msghandler(U) nfs(U) lockd(U) fscache(U) nfs_acl(U)
>>> auth_rpcgss(U) ext3(U) jbd(U) sunrpc(U) ipv6(U) dm_mirror(U)
>>> dm_region_hash(U) dm_log(U) scsi_dh_emc(U) dm_round_robin(U)
>>> dm_multipath(U) ioatdma(U) i7core_edac(U) igb(U) edac_core(U) sg(U)
>>> i2c_i801(U) i2c_core(U) iTCO_wdt(U) dca(U) iTCO_vendor_support(U)
>>> ext4(U) mbcache(U) jbd2(U) usbhid(U) hid(U) sd_mod(U) crc_t10dif(U)
>>> lpfc(U) ahci(U) scsi_transport_fc(U) ehci_hcd(U) uhci_hcd(U)
>>> scsi_tgt(U) dm_mod(U) [last unloaded: ipmi_msghandler]
>>>
>>> Modules linked in: sctp(U) libcrc32c(U) ocfs2(U)
>>> ocfs2_nodemanager(U) ocfs2_stack_user(U) ocfs2_stackglue(U) dlm(U)
>>> configfs(U) acpi_cpufreq(U) freq_table(U) ipmi_devintf(U) ipmi_si(U)
>>> ipmi_msghandler(U) nfs(U) lockd(U) fscache(U) nfs_acl(U)
>>> auth_rpcgss(U) ext3(U) jbd(U) sunrpc(U) ipv6(U) dm_mirror(U)
>>> dm_region_hash(U) dm_log(U) scsi_dh_emc(U) dm_round_robin(U)
>>> dm_multipath(U) ioatdma(U) i7core_edac(U) igb(U) edac_core(U) sg(U)
>>> i2c_i801(U) i2c_core(U) iTCO_wdt(U) dca(U) iTCO_vendor_support(U)
>>> ext4(U) mbcache(U) jbd2(U) usbhid(U) hid(U) sd_mod(U) crc_t10dif(U)
>>> lpfc(U) ahci(U) scsi_transport_fc(U) ehci_hcd(U) uhci_hcd(U)
>>> scsi_tgt(U) dm_mod(U) [last unloaded: ipmi_msghandler]
>>> Pid: 4306, comm: dlm_recv/35 Not tainted
>>> 2.6.32-30.el6.Bull.14.x86_64 #2 bullx super-node
>>> RIP: 0010:[<ffffffffa039edeb>]  [<ffffffffa039edeb>]
>>> receive_from_sock+0x38b/0x430 [dlm]
>>> RSP: 0018:ffff88047a357d10  EFLAGS: 00010246
>>> RAX: 0000000000000095 RBX: ffff88087c5aad20 RCX: 0000000000001b62
>>> RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246
>>> RBP: ffff88047a357e10 R08: 00000000ffffffff R09: 0000000000000000
>>> R10: ffff881079423480 R11: 0000000000000000 R12: ffff88047a357da0
>>> R13: 0000000000000030 R14: ffff88087c5aad30 R15: ffff88047a357d40
>>> FS:  0000000000000000(0000) GS:ffff880c8e600000(0000) knlGS:0000000000000000
>>> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>>> CR2: 000000359bea6df0 CR3: 0000000001001000 CR4: 00000000000006e0
>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>> Process dlm_recv/35 (pid: 4306, threadinfo ffff88047a356000, task
>>> ffff88047aeaf2e0)
>>> Stack:
>>>    ffff88047a357db0 0000000000000246 ffff880c7da288c0 0000000000000246
>>> <0>   ffff88047a357d70 0000100081049472 0000000000000000 0000000000000000
>>> <0>   ffff88047a357d80 0000000000000002 ffff88047a357dd0 0000000000000000
>>> Call Trace:
>>>    [<ffffffffa039dd70>] ? process_recv_sockets+0x0/0x50 [dlm]
>>>    [<ffffffffa039dda6>] process_recv_sockets+0x36/0x50 [dlm]
>>>    [<ffffffffa039dd70>] ? process_recv_sockets+0x0/0x50 [dlm]
>>>    [<ffffffff8107ce9d>] worker_thread+0x16d/0x290
>>>    [<ffffffff81082430>] ? autoremove_wake_function+0x0/0x40
>>>    [<ffffffff8107cd30>] ? worker_thread+0x0/0x290
>>>    [<ffffffff810820d6>] kthread+0x96/0xa0
>>>    [<ffffffff8100d1aa>] child_rip+0xa/0x20
>>>    [<ffffffff81082040>] ? kthread+0x0/0xa0
>>>    [<ffffffff8100d1a0>] ? child_rip+0x0/0x20
>>> Code: f1 4c 8b 8d 70 ff ff ff 48 c1 e6 0c 4c 01 d6 e8 f1 33 0b e1 4c
>>> 89 f7 e8 14 4b 0b e1 31 f6 48 89 df e8 6a f4 ff ff e9 cd fd ff ff
>>> <0f>   0b 0f 1f 00 eb fb 48 c7 c7 e8 9a 3a a0 31 c0 e8 c5 33 0b e1
>>> RIP  [<ffffffffa039edeb>] receive_from_sock+0x38b/0x430 [dlm]
>>>    RSP<ffff88047a357d10>
>>> crash>
>>>
>>>
>>>
>>> Le 21/10/2010 17:13, David Teigland a ?crit :
>>>
>>>        
>>>>> kernel BUG at fs/dlm/lowcomms.c:647!
>>>>>
>>>>>            
>>>> That looks like an interesting one, I haven't seen it before.
>>>> First ensure dlm is not configured to use sctp (that code is
>>>> not widely tested.)  Other than that, if you'd like to start
>>>> debugging this before I get around to it, replace the BUG_ON
>>>> with some printk's and return error.  The conn with nodeid 0 is
>>>> the listening socket, for which tcp_accept_from_sock() should
>>>> be called rather than receive_from_sock().
>>>>
>>>>
>>>>
>>>>          
>>>>>          KERNEL: /usr/lib/debug/lib/modules/2.6.32-100.0.19.el5/vmlinux
>>>>>        DUMPFILE: /var/var/crash/127.0.0.1-2010-10-18-16:42:07/vmcore
>>>>> [PARTIAL DUMP]
>>>>>            CPUS: 64
>>>>>            DATE: Mon Oct 18 16:41:48 2010
>>>>>          UPTIME: 00:15:00
>>>>> LOAD AVERAGE: 1.06, 1.22, 1.65
>>>>>           TASKS: 1594
>>>>>        NODENAME: chili0
>>>>>         RELEASE: 2.6.32-100.0.19.el5
>>>>>         VERSION: #1 SMP Fri Sep 17 17:51:41 EDT 2010
>>>>>         MACHINE: x86_64  (1999 Mhz)
>>>>>          MEMORY: 64 GB
>>>>>           PANIC: "kernel BUG at fs/dlm/lowcomms.c:647!"
>>>>>             PID: 27062
>>>>>         COMMAND: "dlm_recv/34"
>>>>>            TASK: ffff880c7caa00c0  [THREAD_INFO: ffff880c77c6a000]
>>>>>             CPU: 34
>>>>>           STATE: TASK_RUNNING (PANIC)
>>>>>
>>>>> crash>    bt
>>>>> PID: 27062  TASK: ffff880c7caa00c0  CPU: 34  COMMAND: "dlm_recv/34"
>>>>>     #0 [ffff880c77c6b910] machine_kexec at ffffffff8102cc9b
>>>>>     #1 [ffff880c77c6b990] crash_kexec at ffffffff810964d4
>>>>>     #2 [ffff880c77c6ba60] oops_end at ffffffff81439bd9
>>>>>     #3 [ffff880c77c6ba90] die at ffffffff81015639
>>>>>     #4 [ffff880c77c6bac0] do_trap at ffffffff8143952c
>>>>>     #5 [ffff880c77c6bb10] do_invalid_op at ffffffff81013902
>>>>>     #6 [ffff880c77c6bbb0] invalid_op at ffffffff81012b7b
>>>>>        [exception RIP: receive_from_sock+1364]
>>>>>        RIP: ffffffffa02406c3  RSP: ffff880c77c6bc60  RFLAGS: 00010246
>>>>>        RAX: 0000000000000030  RBX: ffff8810774b8d30  RCX: ffff88087c4548f8
>>>>>        RDX: 0000000000000030  RSI: ffff880876dce000  RDI: ffffffff81398045
>>>>>        RBP: ffff880c77c6be50   R8: ffff000000000000   R9: ffff880c77c6b900
>>>>>        R10: ffff880c77c6b8f0  R11: 0000000000000030  R12: 0000000000000030
>>>>>        R13: ffff8810774b8d20  R14: ffff880c7caa00c0  R15: ffffffffa023ecca
>>>>>        ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>>>>     #7 [ffff880c77c6be58] process_recv_sockets at ffffffffa023ecea
>>>>>     #8 [ffff880c77c6be78] worker_thread at ffffffff81071802
>>>>>     #9 [ffff880c77c6bee8] kthread at ffffffff810756d3
>>>>> #10 [ffff880c77c6bf48] kernel_thread at ffffffff81012dea
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Ocfs2-users mailing list
>>>>> Ocfs2-users at oss.oracle.com
>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>>
>>>>>            
>>>>
>>>>          
>>>
>>>        
>>
>>      
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>
>