[Ocfs2-devel] [PATCH] ocfs2: re-queue AST or BAST if sending is failed to improve the reliability

Mark Fasheh mfasheh at versity.com
Mon Aug 7 13:19:59 PDT 2017


On Mon, Aug 7, 2017 at 2:13 AM, Changwei Ge <ge.changwei at h3c.com> wrote:
> Hi,
>
> In current code, while flushing AST, we don't handle an exception that
> sending AST or BAST is failed.
> But it is indeed possible that AST or BAST is lost due to some kind of
> networks fault.
>
> If above exception happens, the requesting node will never obtain an AST
> back, hence, it will never acquire the lock or abort current locking.
>
> With this patch, I'd like to fix this issue by re-queuing the AST or
> BAST if sending is failed due to networks fault.
>
> And the re-queuing AST or BAST will be dropped if the requesting node is
> dead!
>
> It will improve the reliability a lot.

Can you detail your testing? Code-wise this looks fine to me but as
you note, this is a pretty hard to hit corner case so it'd be nice to
hear that you were able to exercise it.

Thanks,
   --Mark



More information about the Ocfs2-devel mailing list