[Ocfs2-devel] [DRAFT 2/2] ocfs2: fix deadlock caused by recursive cluster locking
Eric Ren
zren at suse.com
Mon Nov 14 02:03:44 PST 2016
Hi,
On 11/14/2016 01:42 PM, piaojun wrote:
> Hi Eric,
>
>
> OCFS2_LOCK_BLOCKED flag of this lockres is set in BAST (ocfs2_generic_handle_bast) when downconvert is needed
> on behalf of remote lock request.
>
> The recursive cluster lock (the second one) will be blocked in __ocfs2_cluster_lock() because of OCFS2_LOCK_BLOCKED.
> But the downconvert cannot be done, why? because there is no chance for the first cluster lock on this node to be unlocked -
> we blocked ourselves in the code path.
>
> Eric
> You clear my doubt. I will look through your solution.
Thanks for your attention. Actually, I tried different versions of draft patch locally.
Either of them can satisfy myself so far.
Some rules I'd like to follow:
1) check and avoid recursive cluster locking, rather than allow it which Junxiao had tried
before;
2) Just keep track of lock resource that meets the following requirements:
a. normal inodes (non systemfile);
b. inode metadata lockres (not open, rw lockres);
why? to avoid more special cluster locking usecases, like journal systemfile, "LOST+FOUND"
open lockres, that lock/unlock
operations are performed by different processes, making tracking task more tricky.
3) There is another problem if we follow "check + avoid" pattern, which I have mentioned in
this thread:
"""
This is wrong. We also depend ocfs2_inode_lock() pass out "bh" for later use.
So, we may need another function something like ocfs2_inode_getbh():
if (!oh)
ocfs2_inode_lock();
else
ocfs2_inode_getbh();
"""
Hope we can work out a nice solution for this tricky issue ;-)
Eric
>
More information about the Ocfs2-devel
mailing list