[Ocfs2-devel] dlm stress test hangs OCFS2

Sunil Mushran sunil.mushran at oracle.com
Mon Sep 21 10:25:36 PDT 2009


The patch does not have a fix. Only tracing. We may have to disable
a printk for the 2 node to reproduce.

For the BUG, can I have the full logs. The oops trace and the tracing
from all nodes.

Thanks
Sunil

Coly Li wrote:
> Hi Sunil,
>
> I tried this patch, on 2 nodes cluster, it works. No blocking observed so far.
> Then I run it on a 4 nodes cluster, run make_panic on each node simultaneously,
> and BUG inside ocfs2_prepare_downconvert() triggered (in line 3224) on one of
> the nodes (I observed the oops on node x4),
>
> 3214 static unsigned int ocfs2_prepare_downconvert(struct ocfs2_lock_res *lockres,
> 3215                                               int new_level)
> 3216 {
> 3217         assert_spin_locked(&lockres->l_lock);
> 3218
> 3219         BUG_ON(lockres->l_blocking <= DLM_LOCK_NL);
> 3220
> 3221         if (lockres->l_level <= new_level) {
> 3222                 mlog(ML_ERROR, "lockres->l_level (%d) <= new_level (%d)\n",
> 3223                      lockres->l_level, new_level);
> 3224                 BUG();
> 3225         }
> 3226
> 3227         mlog(ML_NOTICE, "lock %s, new_level = %d, l_blocking = %d\n",
> 3228              lockres->l_name, new_level, lockres->l_blocking);
> 3229
> 3230         lockres->l_action = OCFS2_AST_DOWNCONVERT;
> 3231         lockres->l_requested = new_level;
> 3232         lockres_or_flags(lockres, OCFS2_LOCK_BUSY);
> 3233         return lockres_set_pending(lockres);
> 3234 }
>
> I am trying to understand what you did now :-)
>
> Sunil Mushran Wrote:
>   
>> So originally my thinking was that the dc thread was not getting kicked.
>> That is not the case. The lock is getting downconverted. But it is getting
>> upconverted shortly thereafter. This just could be the case in which
>> dlmglue
>> is slow to increment the holders to block the dc thread from downconverting
>> the lock. The snippet shows that BAST is received 16 usecs after the
>> upconvert.
>>
>> Coly, I have another patch. Pop out the older patch before applying this
>> one.
>> http://oss.oracle.com/~smushran/0001-ocfs2-Patch-to-debug-hang-in-dlmglue-when-running-d.patch
>>
>>     




More information about the Ocfs2-devel mailing list