[Ocfs2-users] flock errors in dmesg

Brian Kroth bpkroth at gmail.com
Fri Jan 16 18:36:09 PST 2009


Thanks for the info you guys.  I ended up having to take everything  
down, fsck, and remount with localflocks.  Hopefully that will prevent  
it from happening again while we wait for the kernel fix to go  
stable.  Nothing should be doing flock anymore anyways.

Luckily we can take san based snapshots of pre and post fsck to see  
what's changed.  That script is still running.  I'll report back if I  
have questions about that.  So far it looks to be just the  
dovecot.index.cache files that got nuked, which is to be expected  
since that's what dovecot was flocking.

Thanks again,
Brian

On Jan 15, 2009, at 11:58 AM, Coly Li <coly.li at suse.de> wrote:

>
>
> Brian Kroth Wrote:
>> I've been working on creating a mail cluster using ocfs2.  Dovecot  
>> was
>> configured to use flock since the kernel we're running is debian  
>> based
>> 2.6.26 which supports cluster aware flock.  User space is 1.4.1.   
>> During
>> testing everything seemed fine, but when we got a real load on  
>> things we
>> got a whole bunch of these messages in dmesg on the node that was
>> hosting imap.  Note that it's maildir and only one node is hosting  
>> imap
>> so we don't actually need flock.
>>
>> I think we're going to switch back to dotlock'ing but I was hoping
>> someone could interpret these error messages for me?  Are they
>> dangerous?
>>
>
> This is an known issue and the patch gets merged in 2.6.29-rc1. Here  
> is the patch for your reference.
>
> Author: Sunil Mushran <sunil.mushran at oracle.com>
>    ocfs2/dlm: Fix race during lockres mastery
>
>    dlm_get_lock_resource() is supposed to return a lock resource  
> with a proper
>    master. If multiple concurrent threads attempt to lookup the  
> lockres for the
>    same lockid while the lock mastery in underway, one or more  
> threads are likely
>    to return a lockres without a proper master.
>
>    This patch makes the threads wait in dlm_get_lock_resource()  
> while the mastery
>    is underway, ensuring all threads return the lockres with a  
> proper master.
>
>    This issue is known to be limited to users using the flock()  
> syscall. For all
>    other fs operations, the ocfs2 dlmglue layer serializes the dlm  
> op for each
>    lockid.
>
>    Users encountering this bug will see flock() return EINVAL and  
> dmesg have the
>    following error:
>    ERROR: Dlm error "DLM_BADARGS" while calling dlmlock on resource  
> <LOCKID>: bad api args
>
>    Reported-by: Coly Li <coyli at suse.de>
>    Signed-off-by: Sunil Mushran <sunil.mushran at oracle.com>
>    Signed-off-by: Mark Fasheh <mfasheh at suse.com>
> ---
> 7b791d68562e4ce5ab57cbacb10a1ad4ee33956e
> fs/ocfs2/dlm/dlmmaster.c |    9 ++++++++-
> 1 files changed, 8 insertions(+), 1 deletions(-)
>
> diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
> index cbf3abe..54e182a 100644
> --- a/fs/ocfs2/dlm/dlmmaster.c
> +++ b/fs/ocfs2/dlm/dlmmaster.c
> @@ -732,14 +732,21 @@ lookup:
>    if (tmpres) {
>        int dropping_ref = 0;
>
> +        spin_unlock(&dlm->spinlock);
> +
>        spin_lock(&tmpres->spinlock);
> +        /* We wait for the other thread that is mastering the  
> resource */
> +        if (tmpres->owner == DLM_LOCK_RES_OWNER_UNKNOWN) {
> +            __dlm_wait_on_lockres(tmpres);
> +            BUG_ON(tmpres->owner == DLM_LOCK_RES_OWNER_UNKNOWN);
> +        }
> +
>        if (tmpres->owner == dlm->node_num) {
>            BUG_ON(tmpres->state & DLM_LOCK_RES_DROPPING_REF);
>            dlm_lockres_grab_inflight_ref(dlm, tmpres);
>        } else if (tmpres->state & DLM_LOCK_RES_DROPPING_REF)
>            dropping_ref = 1;
>        spin_unlock(&tmpres->spinlock);
> -        spin_unlock(&dlm->spinlock);
>
>        /* wait until done messaging the master, drop our ref to allow
>         * the lockres to be purged, start over. */
>
>
>> Thanks,
>> Brian
> [snip]
>
> -- 
> Coly Li
> SuSE Labs
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users



More information about the Ocfs2-users mailing list