[Ocfs2-devel] dlm stress test hangs OCFS2

Coly Li coly.li at suse.de
Mon Sep 21 10:25:53 PDT 2009


Hi Sunil,

I tried this patch, on 2 nodes cluster, it works. No blocking observed so far.
Then I run it on a 4 nodes cluster, run make_panic on each node simultaneously,
and BUG inside ocfs2_prepare_downconvert() triggered (in line 3224) on one of
the nodes (I observed the oops on node x4),

3214 static unsigned int ocfs2_prepare_downconvert(struct ocfs2_lock_res *lockres,
3215                                               int new_level)
3216 {
3217         assert_spin_locked(&lockres->l_lock);
3218
3219         BUG_ON(lockres->l_blocking <= DLM_LOCK_NL);
3220
3221         if (lockres->l_level <= new_level) {
3222                 mlog(ML_ERROR, "lockres->l_level (%d) <= new_level (%d)\n",
3223                      lockres->l_level, new_level);
3224                 BUG();
3225         }
3226
3227         mlog(ML_NOTICE, "lock %s, new_level = %d, l_blocking = %d\n",
3228              lockres->l_name, new_level, lockres->l_blocking);
3229
3230         lockres->l_action = OCFS2_AST_DOWNCONVERT;
3231         lockres->l_requested = new_level;
3232         lockres_or_flags(lockres, OCFS2_LOCK_BUSY);
3233         return lockres_set_pending(lockres);
3234 }

I am trying to understand what you did now :-)

Sunil Mushran Wrote:
> So originally my thinking was that the dc thread was not getting kicked.
> That is not the case. The lock is getting downconverted. But it is getting
> upconverted shortly thereafter. This just could be the case in which
> dlmglue
> is slow to increment the holders to block the dc thread from downconverting
> the lock. The snippet shows that BAST is received 16 usecs after the
> upconvert.
> 
> Coly, I have another patch. Pop out the older patch before applying this
> one.
> http://oss.oracle.com/~smushran/0001-ocfs2-Patch-to-debug-hang-in-dlmglue-when-running-d.patch
> 
-- 
Coly Li
SuSE Labs



More information about the Ocfs2-devel mailing list