[Ocfs2-tools-devel] [PATCH] ocfs2_controld: Fix double-leave in complete_mount().

Sat Aug 16 09:14:29 PDT 2008

On Sat, Aug 16, 2008 at 02:54:50AM -0700, Joel Becker wrote:
> In commit f5032771bc41bf9ff31ed42f332d8ec8def39e55 (ocfs2_controld: Allow
> multiple real mounts.) we stopped tracking mountpoints.  Instead, we
> describe different applications as "services".  "tunefs.ocfs2" is one
> service.  "fsck.ocfs2" is another.  The actual filesystem uses the
> service "ocfs2".
> 
> Only one instance of a service is allowed, except for the filesystem.
> You can, of course, have one device mounted at multiple mountpoints:
> 
>  # mount /dev/sdb1 /ocfs2
>  # mount /dev/sdb1 /ocfs3
> 
> In the special case of the "ocfs2" service, ocfs2_controld will allow
> more than one MOUNT call (send by o2cb_begin_group_join).
> The additinaly mounters get EALREADY, which libo2cb knows to interpret
> as success.  It goes like this:
> 
> mount.ocfs2 on /ocfs2 (first)		ocfs2_controld
> -----------------------------		--------------
> o2cb_begin_group_join(uuid)
> 					start_join(uuid)
> 					finish_join(uuid)
> 					dlmcontrol_register(uuid)
> 					notify_mount_client(0)
> err = mount(dev, mntpnt)
> o2cb_complete_group_join(uuid, err)
> 					complete_mount(uuid, err)
> 
> mount.ocfs2 on /ocfs3 (additional)	ocfs2_controld
> ----------------------------------	--------------
> o2cb_begin_group_join(uuid)
> 					notify_client(EALREADY)
> err = mount(dev, mntpnt)
> o2cb_complete_group_join(uuid, err)
> 					complete_mount(uuid, err)
> 
> Here's the crux of the problem.  If that first mounter gets an error
> from mount(2), the daemon's complete_mount() should leave the group.
> There's no filesystem mounted.
> 
> However, if the *second* mounter gets an error from mount(2) (say, a
> missing mountpoint), the daemon should not leave the group - that first
> mount is still going!  That's the bug.  The daemon didn't know the
> difference, and it would leave the group.  The first mount was left out
> in the lurch.
> 
> The fix is to mark the additional mounts as such.  complete_mount()
> notices the additional flag and does nothing beyond responding to
> mount.ocfs2(8).
> 
> dead_mounter() had the same problem.  If an additional mounter died, it
> was treated like a first mounter.  dead_mounter() now does what
> complete_mount() does.  It cleans up the additional state and nothing
> else.
> 
> While we're there, we've learned enough about our state to handle first
> mounts that died before sending their status to the daemon.  We've
> always known that a dead_mounter() during group join could just set
> leave_on_join.  But if the mount program has already been notified,
> there may be a mounted filesystem.  We pinned the filesystem as busy and
> basically locked out all other operations.
> 
> But as it turns out, a fully operational group is a good state.  We can
> clear the in-progress flag and allow new operations.  Additional mounts
> can happen cleanly, and umounts as well.  If the filesystem never got
> mounted, a cleanup with ocfs2_hb_ctl is safe.  It's up to the
> administrator to check safety, but it's a predictable environment.

Ok, that's good. I'm glad that we're erring on the side of caution. We can
worry about being extra-fancy later :)

The patch looks good - thanks for commenting the heck out of it too.

Signed-off-by: Mark Fasheh <mfasheh at suse.com>
	--Mark

--
Mark Fasheh