[Ocfs2-devel] [PATCH] ocfs2: re-queue AST or BAST if sending is failed to improve the reliability
Mark Fasheh
mfasheh at versity.com
Tue Aug 22 13:49:59 PDT 2017
On Tue, Aug 8, 2017 at 5:56 AM, Changwei Ge <ge.changwei at h3c.com> wrote:
>>> It will improve the reliability a lot.
>> Can you detail your testing? Code-wise this looks fine to me but as
>> you note, this is a pretty hard to hit corner case so it'd be nice to
>> hear that you were able to exercise it.
>>
>> Thanks,
>> --Mark
> Hi Mark,
>
> My test is quite simple to perform.
> Test environment includes 7 hosts. Ethernet devices in 6 of them are
> down and then up repetitively.
> After several rounds of up and down. Some file operation hangs.
>
> Through debugfs.ocfs2 tool involved in NODE 2 which was the owner of
> lock resource 'O000000000000000011150300000000',
> it told that:
>
> debugfs: dlm_locks O000000000000000011150300000000
> Lockres: O000000000000000011150300000000 Owner: 2 State: 0x0
> Last Used: 0 ASTs Reserved: 0 Inflight: 0 Migration Pending: No
> Refs: 4 Locks: 2 On Lists: None
> Reference Map: 3
> Lock-Queue Node Level Conv Cookie Refs AST BAST
> Pending-Action
> Granted 2 PR -1 2:53 2 No No None
> Granted 3 PR -1 3:48 2 No No None
>
> That meant NODE 2 had granted NODE 3 and the AST had been transited to
> NODE 3.
>
> Meanwhile, through debugfs.ocfs2 tool involved in NODE 3,
> it told that:
> debugfs: dlm_locks O000000000000000011150300000000
> Lockres: O000000000000000011150300000000 Owner: 2 State: 0x0
> Last Used: 0 ASTs Reserved: 0 Inflight: 0 Migration Pending: No
> Refs: 3 Locks: 1 On Lists: None
> Reference Map:
> Lock-Queue Node Level Conv Cookie Refs AST BAST
> Pending-Action
> Blocked 3 PR -1 3:48 2 No No None
>
> That meant NODE 3 didn't ever receive any AST to move local lock from
> blocked list to grant list.
>
> This consequence makes sense, since AST sending is failed which can be
> seen in kernel log.
>
> As for BAST, it is more or less the same.
>
> Thanks
> Changwei
Thanks for the testing details. I think you got Andrew's e-mail wrong
so I'm CC'ing him now. It might be a good idea to re-send the patch
with the right CC's - add some of your testing details to the log.
You're free to use my
Reviewed-by: Mark Fasheh <mfasheh at versity.com>
as well.
Thanks again,
--Mark
More information about the Ocfs2-devel
mailing list