[Ocfs2-devel] Race condition between OCFS2 downconvert thread and ocfs2 cluster lock.

Tue Feb 21 16:58:26 PST 2012

here is the whole lockres and backtrace:

KERNEL: vmlinux
DUMPFILE: vmcore1
CPUS: 16
DATE: Tue Jan 17 20:48:22 2012
UPTIME: 100 days, 21:11:24
LOAD AVERAGE: 7.11, 7.46, 7.85
TASKS: 1210
NODENAME: sgi-not-efped05
RELEASE: 2.6.18-92.el5
VERSION: #1 SMP Tue Apr 29 13:16:15 EDT 2008
MACHINE: x86_64 (2400 Mhz)
MEMORY: 63.1 GB
PANIC: ""
PID: 6252
COMMAND: "ocfs2dc"
TASK: ffff810c7de5b860 [THREAD_INFO: ffff810c53928000]
CPU: 6
STATE: TASK_RUNNING (PANIC)

crash64> bt
PID: 6252 TASK: ffff810c7de5b860 CPU: 6 COMMAND: "ocfs2dc"
#0 [ffff810c53929bb0] crash_kexec at ffffffff800aa977
#1 [ffff810c53929c70] __die at ffffffff800650af
#2 [ffff810c53929cb0] die at ffffffff8006b6ae
#3 [ffff810c53929ce0] do_invalid_op at ffffffff8006bc6e
#4 [ffff810c53929da0] error_exit at ffffffff8005dde9
[exception RIP: ocfs2_prepare_cancel_convert+263]
RIP: ffffffff8840649f RSP: ffff810c53929e50 RFLAGS: 00010082
RAX: 00000000ffffffff RBX: ffff8105d2a28280 RCX: ffff8105d2a28298
RDX: ffff810c3f1526a8 RSI: ffff8105d2a28280 RDI: ffff810c3f152000
RBP: ffff8105d2a28290 R8: ffff810c53928000 R9: 000000000000003a
R10: ffff81102fd84038 R11: ffff81102fb0c860 R12: 0000000000000246
R13: ffff810c3f152000 R14: 0000000000000000 R15: 0000000000000246
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#5 [ffff810c53929ee8] kthread at ffffffff8003253d
#6 [ffff810c53929f48] kernel_thread at ffffffff8005dfb1
crash64> mod -s ocfs2 
/home/xiaowhu/usr/lib/debug/lib/modules/2.6.18-92.el5/kernel/fs/ocfs2/ocfs2.ko.debug
MODULE NAME SIZE OBJECT FILE
ffffffff8845bb80 ocfs2 492072 
/home/xiaowhu/usr/lib/debug/lib/modules/2.6.18-92.el5/kernel/fs/ocfs2/ocfs2.ko.debug


crash64> struct ocfs2_lock_res ffff8105d2a28280
struct ocfs2_lock_res {
l_priv = 0xffff8105d2a285a0,
l_ops = 0xffffffff8845b360,
l_lock = {
raw_lock = {
slock = 4294967295
}
},
l_blocked_list = {
next = 0xffff8105d2a28298,
prev = 0xffff8105d2a28298
},
l_mask_waiters = {
next = 0xffff8104b2979a98,
prev = 0xffff8104b2979a98
},
l_type = OCFS2_LOCK_TYPE_META,
l_flags = 326,
l_name = "M0000000000000000a1faac00000000",
l_level = -1,
l_ro_holders = 0,
l_ex_holders = 0,
l_lksb = {
status = DLM_NORMAL,
flags = 0,
lockid = 0xffff810a606c0740,
lvb = 
"\005\000\000\001\000\000\000\000\000\000)\v\000\000\000e\023ő\253\224\207i\023\022\333i\204\312\001@\277\021f\373\316\000\000\000\000\000\000\000\000\000\000\017\070A\355\000\002\000\000\000\000\257L\302\331\000\000\000"
},
l_action = OCFS2_AST_ATTACH,
l_unlock_action = OCFS2_UNLOCK_INVALID,
l_requested = 3,
l_blocking = 5,
l_event = {
lock = {
raw_lock = {
slock = 1
}
},
task_list = {
next = 0xffff8105d2a28360,
prev = 0xffff8105d2a28360
}
},
l_debug_list = {
next = 0xffff81033cabad08,
prev = 0xffff8105d2a284a8
},
l_lock_num_prmode = 0,
l_lock_num_exmode = 0,
l_lock_num_prmode_failed = 0,
l_lock_num_exmode_failed = 0,
l_lock_total_prmode = 0,
l_lock_total_exmode = 0,
l_lock_max_prmode = 0,
l_lock_max_exmode = 0,
l_lock_refresh = 0
}


On 02/22/2012 08:45 AM, Sunil Mushran wrote:
> Both AST and BAST can only be sent by the master. And we ensure the 
> master sends the ASTs before BAST.
>
> Do you have the full lockres dump?
>
> On 02/21/2012 04:36 PM, Xiaowei.hu wrote:
>> Hi Sunil,
>>
>> I mean it execute in this way:
>>
>> nodeA ocfs2_dlm_lock() and released the res spin lock,here A doesn't
>> hold spin locks,
>> then it start to execute the proxy ast handler , process bast request
>> from nodeB,
>> then dlmthread flushed the bast, after this node A start to queue its
>> ast in ocfs2_dlm_lock() function.
>>
>> Thanks,
>> Xiaowei
>> On 02/22/2012 01:48 AM, Sunil Mushran wrote:
>>> > bast queued and flushed,before the ast was queued
>>>
>>> Unlikely with o2dlm. dlmthread always sends ASTs before BASTs.
>>>
>>> Can you recreate the entire lockres? A full dump may yield more
>>> information.
>>>
>>> Sunil
>>>
>>> On 02/20/2012 10:12 PM, xiaowei.hu at oracle.com wrote:
>>>> I am trying to fix bug13611997,CT's machine run into BUG in ocfs2dc
>>>> thread, BUG_ON(lockres->l_action != OCFS2_AST_CONVERT&&
>>>> lockres->l_action != OCFS2_AST_DOWNCONVERT); I analysized the vmcore
>>>> , the lockres->l_action = OCFS2_AST_ATTACH and l_flags=326(which
>>>> means
>>>> OCFS2_LOCK_BUSY|OCFS2_LOCK_BLOCKED|OCFS2_LOCK_INITIALIZED|OCFS2_LOCK_QUEUED), 
>>>>
>>>> after compared with the code , this status could be only possible
>>>> during ocfs2_cluster_lock,here is the race situation:
>>>>
>>>> NodeA NodeB
>>>> ocfs2_cluster_lock on a new lockres M
>>>> spin_lock_irqsave(&lockres->l_lock, flags);
>>>> gen = lockres_set_pending(lockres);
>>>> lockres->l_action = OCFS2_AST_ATTACH;
>>>> lockres_or_flags(lockres, OCFS2_LOCK_BUSY);
>>>> spin_unlock_irqrestore(&lockres->l_lock, flags);
>>>>
>>>> ocfs2_dlm_lock() finished and returned.
>>>> **and lockres_clear_pending(lockres, gen, osb);
>>>> request a lock on the same lockres M
>>>> It's blocked by nodeA, and a ast proxy was send to A
>>>>
>>>> bast queued and flushed,before the ast was queued
>>>> then the ocfs2dc was scheduled
>>>> there is a chance to execute this code path:
>>>> ocfs2_downconvert_thread()
>>>> ocfs2_downconvert_thread_do_work()
>>>> ocfs2_blocking_ast()
>>>> ocfs2_process_blocked_lock()
>>>> ocfs2_unblock_lock()
>>>> spin_lock_irqsave(&lockres->l_lock, flags);
>>>> if (lockres->l_flags& OCFS2_LOCK_BUSY)
>>>> ret = ocfs2_prepare_cancel_convert(osb, lockres);
>>>> BUG_ON(lockres->l_action != OCFS2_AST_CONVERT&&
>>>> lockres->l_action != OCFS2_AST_DOWNCONVERT);
>>>> here trigger the BUG()
>>>>
>>>> Solution:
>>>> One possible solution for this is to remove the lockres_clear_pending
>>>> marked by 2 stars, and left this clear work to the ast function.In
>>>> this way could make sure the bast function wait for ast , let it
>>>> clear OCFS2_LOCK_BUSY and set OCFS2_LOCK_ATTACHED first, before enter
>>>> downconvert process.
>>>>
>>>>
>>>
>>
>