[Ocfs2-devel] [PATCH] Wakeup down-convert thread just after clearing OCFS2_LOCK_UPCONVERT_FINISHING -v3
Wengang Wang
wen.gang.wang at oracle.com
Thu Sep 15 17:42:00 PDT 2011
Hi Sunil,
On 11-09-15 10:21, Sunil Mushran wrote:
> http://people.redhat.com/~teigland/make_panic
>
> This test has been useful in exposing dlmglue issues.
>
> On 09/15/2011 10:15 AM, Sunil Mushran wrote:
> >I am fine with the kick in recover from dlm error. Not so in cluster lock.
> >We have to be very very sure before meddling with that function. It is
> >a state machine with many hidden gotchas.
> >
> >So is this patch for a bug encountered or just code audit. Also, what kind
> >testing has been done.
Yes, a bug(hang) encountered. After testing and analysis, I found the root cause.
And it's NOT the dlm error case. The hang was fixed by the original patch.
I will email you about the orabug and testcase(pretty easy to reproduce).
I will try http://people.redhat.com/~teigland/make_panic to see if it
can find something.
thanks,
wengang.
> >
> >On 09/14/2011 08:27 PM, Wengang Wang wrote:
> >>When the lockres state UPCONVERT_FINISHING is cleared,
> >>we should wake up the downconvert thread incase that lockres
> >>is in the blocked queue. Currently we are not doing so and thus
> >>are at the mercy of another event waking up the dc thread.
> >>
> >>Signed-off-by: Wengang Wang<wen.gang.wang at oracle.com>
> >>---
> >> fs/ocfs2/dlmglue.c | 9 ++++++++-
> >> 1 files changed, 8 insertions(+), 1 deletions(-)
> >>
> >>diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> >>index 7642d7c..524bd88 100644
> >>--- a/fs/ocfs2/dlmglue.c
> >>+++ b/fs/ocfs2/dlmglue.c
> >>@@ -1195,6 +1195,7 @@ static inline void ocfs2_recover_from_dlm_error(struct ocfs2_lock_res *lockres,
> >> int convert)
> >> {
> >> unsigned long flags;
> >>+ int kick_dc;
> >>
> >> spin_lock_irqsave(&lockres->l_lock, flags);
> >> lockres_clear_flags(lockres, OCFS2_LOCK_BUSY);
> >>@@ -1203,9 +1204,12 @@ static inline void ocfs2_recover_from_dlm_error(struct ocfs2_lock_res *lockres,
> >> lockres->l_action = OCFS2_AST_INVALID;
> >> else
> >> lockres->l_unlock_action = OCFS2_UNLOCK_INVALID;
> >>+ kick_dc = (lockres->l_flags& OCFS2_LOCK_QUEUED);
> >> spin_unlock_irqrestore(&lockres->l_lock, flags);
> >>
> >> wake_up(&lockres->l_event);
> >>+ if (kick_dc)
> >>+ ocfs2_wake_downconvert_thread(ocfs2_get_lockres_osb(lockres));
> >> }
> >>
> >> /* Note: If we detect another process working on the lock (i.e.,
> >>@@ -1373,6 +1377,7 @@ static int __ocfs2_cluster_lock(struct ocfs2_super *osb,
> >> unsigned long flags;
> >> unsigned int gen;
> >> int noqueue_attempted = 0;
> >>+ int kick_dc;
> >>
> >> ocfs2_init_mask_waiter(&mw);
> >>
> >>@@ -1500,8 +1505,10 @@ update_holders:
> >> ret = 0;
> >> unlock:
> >> lockres_clear_flags(lockres, OCFS2_LOCK_UPCONVERT_FINISHING);
> >>-
> >>+ kick_dc = (lockres->l_flags& OCFS2_LOCK_QUEUED);
> >> spin_unlock_irqrestore(&lockres->l_lock, flags);
> >>+ if (kick_dc)
> >>+ ocfs2_wake_downconvert_thread(osb);
> >> out:
> >> /*
> >> * This is helping work around a lock inversion between the page lock
> >
> >_______________________________________________
> >Ocfs2-devel mailing list
> >Ocfs2-devel at oss.oracle.com
> >http://oss.oracle.com/mailman/listinfo/ocfs2-devel
>
More information about the Ocfs2-devel
mailing list