[Ocfs2-devel] [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking

Sunil Mushran sunil.mushran at oracle.com
Thu Dec 11 11:23:38 PST 2008


Coly Li wrote:
> Because I am not familiar with the code yet, I though this is an oops triggered by my first
> modification. Therefore, I choose to use a loop which did not trigger the oops.
>
> From your reply, it seems kernel BUG in __dlm_lockres_drop_inflight_ref at dlmmaster.c:680 is
> another bug ? I saw a patch named "ocfs2/dlm: Fix race in adding/removing lockres' to/from the
> tracking list", is it the fix for this bug ? If yes, I should learn how you resolve it ;)

The oops in __dlm_lockres_drop_inflight_ref() is different that the
tracking list oops. No relationship.

The inflight_ref oops is because the "fix" was not taking the ref. Hence
it was zero during the drop. And that was because the "patch fix" was
at the wrong location. See the diff between my first patch and the final one
and see where the inflight ref is taken.

The tracking list bug has always been there. It was exposed during
forked flock() testing as explained in the patch.

> Here is how I thought, please comments on my mistake,
> The dlm associated with lockres is projected by dlm->spinlock, if we only protect lockres by
> lockres->spinlock, there *might* be possibility to modify dlm->node_num somewhere. Since we have
> quite a few places to compare lockres->owner with dlm->node_num, I suspect that manipulating on
> lockres->owner without protecting dlm->owner might be problematic.

dlm->node_num can never be modified. It is the node number which
is fixed for the life of the dlm domain (and more).

Sunil



More information about the Ocfs2-devel mailing list