[Ocfs2-devel] [PATCH] Fix waiting status race condition in dlm recovery

Sunil Mushran sunil.mushran at gmail.com
Wed May 30 18:18:12 PDT 2012


On Tue, May 29, 2012 at 5:41 PM, Xiaowei <xiaowei.hu at oracle.com> wrote:
> On 05/30/2012 06:09 AM, Sunil Mushran wrote:
> I would suggest exploring adding this in dlm hb down event. Checking live
> map all
> over the place is hacky. We do it more than we should right now. Let's not
> add to the
> mess.
>
> HI Sunil,
>
> Do you mean we should clear the bit in domain map in dlm hb down event
> directly when the node down
> and check with dlm_is_node_dead at here?
> Or how could we explore and ensure the node is alive during the whole
> migrate process?One node could die even after it sends out one locks package
> and before the next if there were too many locks on that lockres.

dlm hb down event is triggered when a node is declared dead. That's where we
clean up pending mles, etc. You can add a check for recovery and add logic to
change the reco state for that node there.



More information about the Ocfs2-devel mailing list