[Ocfs2-devel] [patch 1/5] ocfs2: ensure that dlm lockspace is created by kernel module
Gang He
ghe at suse.com
Thu Jul 28 19:54:45 PDT 2016
Hello Mark,
>>>
> On Thu, Jul 28, 2016 at 02:05:56PM -0700, Andrew Morton wrote:
>> From: Gang He <ghe at suse.com>
>> Subject: ocfs2: ensure that dlm lockspace is created by kernel module
>>
>> We encountered a bug from the customer, the user did a fsck.ocfs2 on the
>> file system and exited unusually, the lockspace (with LVB size = 32) was
>> left in the kernel space, next, the user mounted this file system, the
>> kernel module did not create a new lockspace (LVB size = 64) via calling
>> dlm_new_lockspace() function in mounting stage, just used the existing
>> lockspace, created by the user space tool, this would lead the user was
>> not able to mount this file system from the other nodes, with the error
>> message like:
>>
>> dlm: 032F5......: config mismatch: 64,0 nodeid 177127961: 32,0
>> (mount.ocfs2,26981,46):ocfs2_dlm_init:2995 ERROR: status = -71
>> ocfs2_mount_volume:1881 ERROR: status = -71
>> ocfs2_fill_super:1236 ERROR: status = -71
>>
>> The user found it very difficult to find the root cause, then, we brought
>> out this patch to relieve such problem.
>>
>> First, we add one more flag in calling dlm_new_lockspace() function, to
>> make sure the lockspace is created by kernel module itself, and this
>> change will not affect the backward compatibility.
>>
>> Second, the obvious error message is reported in the kernel log, let the
>> user be more easy to find the root cause.
>>
>>
>>
>> This patch will be used to insure the dlm lockspace is created by kernel
>> module when mounting a ocfs2 file system. There are two ways to create a
>> lockspace, from user space and kernel space, but the same name lockspaces
>> probably have different lvblen lengths/flags.
>>
>> To avoid this mix using, we add one more flag DLM_LSFL_NEWEXCL, it will
>> make sure the dlm lockspace is created by kernel module when mounting.
>> Secondly, if a user space program (ocfs2-tools) is running on a file
>> system, the user tries to mount this file system in the cluster, DLM
>> module will return a -EEXIST or -EPROTO errno, we should give the user a
>> obvious error message, then, the user can let that user space tool exit
>> before mounting the file system again.
>
> I really like that we're printing a clear message for the user. I'm
> concerned about a couple things though:
>
> Gang - did you check that *online* userspace tools can still work on a
> mounted cluster with this change? I ask because this isn't the first time
> this issue has come up and if my memory hasn't faded too much we had
> problems with userspace/kernel interactions when we tried to fix it. In
> particular if the kernel says the lockspace is now exclusive, does that mean
> userspace will not be allowed to join, even if it doesn't use the lvb?
>
> Actually, how does this interact with dlmfs? We won't be allowed to join
> domains from dlmfs effectively gutting the ocfs2-tools ability to query the
> cluster. In particualr see this blurb in libocfs2/dlm.c:
>
> /*
> * We want to use dlmfs if we can, as it provides the full feature
> * set of libo2dlm. Any dlmfs with the 'stackglue' capability will
> * support all cluster stacks. An empty cluster.c_stack means
> * o2cb, which always supports dlmfs.
> *
> * If we're unlucky enough to have older userspace stack code,
> * we pass NULL to avoid dlmfs.
> */
>
Yes, we did lots of testing, this code change will not affect the existing ocfs2-tool behavior.
As you said, this part code is very messy, I can not fix this problem directly base on the current code/design.
Then, the fix is only to give the user a obvious error message and prevent the user make matters worse.
That is all we can do for this issue.
Thanks
Gang
> Thanks,
> --Mark
>
>
>>
>> Link: http://lkml.kernel.org/r/1463731940-13044-2-git-send-email-ghe@suse.com
>> Signed-off-by: Gang He <ghe at suse.com>
>> Reviewed-by: Goldwyn Rodrigues <rgoldwyn at suse.com>
>> Cc: Mark Fasheh <mfasheh at suse.de>
>> Cc: Joel Becker <jlbec at evilplan.org>
>> Cc: Junxiao Bi <junxiao.bi at oracle.com>
>> Cc: Joseph Qi <joseph.qi at huawei.com>
>> Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
>> ---
>>
>> fs/ocfs2/stack_user.c | 11 +++++++++--
>> 1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff -puN
> fs/ocfs2/stack_user.c~ocfs2-insure-dlm-lockspace-is-created-by-kernel-module
> fs/ocfs2/stack_user.c
>> --- a/fs/ocfs2/stack_user.c~ocfs2-insure-dlm-lockspace-is-created-by-kernel-module
>> +++ a/fs/ocfs2/stack_user.c
>> @@ -1007,10 +1007,17 @@ static int user_cluster_connect(struct o
>> lc->oc_type = NO_CONTROLD;
>>
>> rc = dlm_new_lockspace(conn->cc_name, conn->cc_cluster_name,
>> - DLM_LSFL_FS, DLM_LVB_LEN,
>> + DLM_LSFL_FS | DLM_LSFL_NEWEXCL, DLM_LVB_LEN,
>> &ocfs2_ls_ops, conn, &ops_rv, &fsdlm);
>> - if (rc)
>> + if (rc) {
>> + if (rc == -EEXIST || rc == -EPROTO)
>> + printk(KERN_ERR "ocfs2: Unable to create the "
>> + "lockspace %s (%d), because a ocfs2-tools "
>> + "program is running on this file system "
>> + "with the same name lockspace\n",
>> + conn->cc_name, rc);
>> goto out;
>> + }
>>
>> if (ops_rv == -EOPNOTSUPP) {
>> lc->oc_type = WITH_CONTROLD;
>> _
> --
> Mark Fasheh
More information about the Ocfs2-devel
mailing list