[Ocfs2-devel] [PATCH] ocfs2/dlm: delay the migration when the lockres is in recovery

Wengang Wang wen.gang.wang at oracle.com
Fri Jun 11 03:25:51 PDT 2010


Any comment on this patch?

regards,
wengang.
On 10-05-25 15:59, Wengang Wang wrote:
> We shouldn't migrate a lockres in recovery state.
> Otherwise, it has the following problem:
> 
> 1) Recovery happened as recovery master on a node(node A) which is in umount
> migrating all lockres' it owned(master is node A) to other nodes, say a node B.
> 2) So node A wants to take over all the lockres' those are mastered by the
> crashed node C. 
> 3) Receiving request_locks request from node A, node B send mig_lockres
> requests(for recovery) to node A for all lockres' that was mastered by the
> crashed node C. It can also send the request for a lockres(lockres A) which is
> not in node A's hashtable.
> 4) Receiving the mig_lockres request for lockres A from node B, a new lockres
> object lockres A', with INRECOVERING flag set, is created and inserted to hash
> table.
> 5) The recovery for lockres A' is going on on node A, it finally mastered the
> lockres A'. And now, RECOVERING flag is not cleared from lockres A' nor from
> lockres A on node B.
> 6) The migration for lockres A' goes since now node A mastered lockres A' already.
> the mig_lockres request(for migration) is sent to node B.
> 7) Node B responsed with -EFAULT because now lockres A is still in recovery state.
> 8) Node A BUG() on the -EFAULT.
> 
> fix:
> The recovery state is cleared on node A(recovery master) after it's cleared on
> node B. We wait until the in recovery state is cleared from node A and migrate
> it to node B. 
> 
> Signed-off-by: Wengang Wang <wen.gang.wang at oracle.com>
> ---
>  fs/ocfs2/dlm/dlmmaster.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
> index 9289b43..de9c128 100644
> --- a/fs/ocfs2/dlm/dlmmaster.c
> +++ b/fs/ocfs2/dlm/dlmmaster.c
> @@ -2371,6 +2371,9 @@ static int dlm_is_lockres_migrateable(struct dlm_ctxt *dlm,
>  		goto leave;
>  	}
>  
> +	if (unlikely(res->state & DLM_LOCK_RES_RECOVERING))
> +		goto leave;
> +
>  	ret = 0;
>  	queue = &res->granted;
>  	for (i = 0; i < 3; i++) {
> -- 
> 1.6.6.1
> 
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel



More information about the Ocfs2-devel mailing list