[Ocfs2-devel] dlm stress test hangs OCFS2
Coly Li
coly.li at suse.de
Mon Sep 21 10:25:53 PDT 2009
Hi Sunil,
I tried this patch, on 2 nodes cluster, it works. No blocking observed so far.
Then I run it on a 4 nodes cluster, run make_panic on each node simultaneously,
and BUG inside ocfs2_prepare_downconvert() triggered (in line 3224) on one of
the nodes (I observed the oops on node x4),
3214 static unsigned int ocfs2_prepare_downconvert(struct ocfs2_lock_res *lockres,
3215 int new_level)
3216 {
3217 assert_spin_locked(&lockres->l_lock);
3218
3219 BUG_ON(lockres->l_blocking <= DLM_LOCK_NL);
3220
3221 if (lockres->l_level <= new_level) {
3222 mlog(ML_ERROR, "lockres->l_level (%d) <= new_level (%d)\n",
3223 lockres->l_level, new_level);
3224 BUG();
3225 }
3226
3227 mlog(ML_NOTICE, "lock %s, new_level = %d, l_blocking = %d\n",
3228 lockres->l_name, new_level, lockres->l_blocking);
3229
3230 lockres->l_action = OCFS2_AST_DOWNCONVERT;
3231 lockres->l_requested = new_level;
3232 lockres_or_flags(lockres, OCFS2_LOCK_BUSY);
3233 return lockres_set_pending(lockres);
3234 }
I am trying to understand what you did now :-)
Sunil Mushran Wrote:
> So originally my thinking was that the dc thread was not getting kicked.
> That is not the case. The lock is getting downconverted. But it is getting
> upconverted shortly thereafter. This just could be the case in which
> dlmglue
> is slow to increment the holders to block the dc thread from downconverting
> the lock. The snippet shows that BAST is received 16 usecs after the
> upconvert.
>
> Coly, I have another patch. Pop out the older patch before applying this
> one.
> http://oss.oracle.com/~smushran/0001-ocfs2-Patch-to-debug-hang-in-dlmglue-when-running-d.patch
>
--
Coly Li
SuSE Labs
More information about the Ocfs2-devel
mailing list