[Ocfs2-devel] [PATCH 3/4] re-enable "ocfs2: mount shared volume without ha stack"

Mark Fasheh mark at fasheh.com
Fri Aug 5 04:11:53 UTC 2022


On Thu, Aug 4, 2022 at 4:53 PM Mark Fasheh <mark at fasheh.com> wrote:
> 2) Should we allow the user to bypass our cluster checks?
>
> On this question I'm still a 'no'. I simply haven't seen enough
> evidence to warrant such a drastic change in policy. Allowing it via
> mount option too just feels extremely error-prone. I think we need to
> explore alternative avenues to help
> ing the user out here. As you noted in your followup, a single node
> config is entirely possible in pacemaker (I've run that config
> myself). Why not provide an easy way for the user to drop down to that
> sort of a config? I know that's kind
> of pushing responsibility for this to the cluster stack, but that's
> where it belongs in the first place.
>
> Another option might be an 'observer mode' mount, where the node
> participates in the cluster (and the file system locking) but purely
> in a read-only fashion.

Thinking about this some more... The only way that this works without
potential corruptions is if we always write a periodic mmp sequence,
even in clustered mode (which might mean each node writes to its own
sector). That way tunefs can always check the disk for a mounted node,
even without a cluster stack up. If tunefs sees anyone writing
sequences to the disk, it can safely fail the operation. Tunefs also
would have to be writing an mmp sequence once it has determined that
the disk is not mounted. It could also write some flag alongisde the
sequence that says 'tunefs is working on this disk'. If a cluster
mount comes up and sees a live sequence with that flag, it will know
to fail the mount request as the disk is being modified. Local mounts
can also use this to ensure that they are the only mounted node.

As it turns out, we already do pretty much all of the sequence writing
already for the o2cb cluster stack - check out cluseter/heartbeat.c.
If memory serves, tunefs.ocfs2 has code to write to this heartbeat
area as well. For o2cb, we use the disk heartbeat to detect node
liveness, and to kill our local node if we see disk timeouts. For
pcmk, we shouldn't take any of these actions as it is none of our
responsibility. Under pcmk, the heartbeating would be purely for mount
protection checks.

The downside to this is that all nodes would be heartbeating to the
disk on a regular interval, not just one. To be fair, this is exactly
how o2cb works and with the correct timeout choices, we were able to
avoid a measurable performance impact, though in any case this might
have to be a small price the user pays for cluster aware mount
protection.

Let me know what you think.

Thanks,
  --Mark



More information about the Ocfs2-devel mailing list