[Ocfs2-users] Avoid node fence and fail gracefully

Vineeth Thampi vineeth.thampi at gmail.com
Fri May 31 08:33:17 PDT 2013


Hi,

I have been working around the issue of Node fence in case of a heartbeat
failure / Network timeout. I modified o2quo_fence_self() in quorum.c to
make all ocfs2 filesystems RO, when tested it worked like a charm, and the
filesystems were made RO, but I am not able to umount the filesystem or
stop O2CB service.

Is there any way by which I could ask O2CB to abort heartbeat and treat the
filesystem as LOCAL instead of GLOBAL?

The following is the code change that I made.

**************************************************
static void make_fs_RO(struct super_block *sb, void *arg)
{
    struct ocfs2_super *osb = OCFS2_SB(sb);

    sb->s_flags |= MS_RDONLY;
    ocfs2_set_osb_flag(osb, OCFS2_OSB_ERROR_FS);
    ocfs2_set_ro_flag(osb, *(int *)arg);
}

/* this is horribly heavy-handed.  It should instead flip the file
 * system RO and call some userspace script. */
static void o2quo_fence_self(void)
{

*...*

        case O2NM_FENCE_RESET:
                printk(KERN_ERR "*** Hard failure in O2CB, all ocfs2 "
                       "filesystems made RO ***\n");

                /* Iterate through all ocfs2 super blocks and make each of
                   them RO */
                fs_type = get_fs_type("ocfs2");
                if (fs_type)
                        iterate_supers_type(fs_type, make_fs_RO,
&hard_reset);

                break;
*...*

}
***************************************************************


The error from kern.log:

=======================================
May 31 16:08:18 localhost kernel: [ 5434.076126]
(kworker/u:2,577,3):dlm_send_remote_convert_request:395 ERROR: Error -107
when sending message 504 (key 0xcfe4a084) to node 0
May 31 16:08:18 localhost kernel: [ 5434.076178] o2dlm: Waiting on the
death of node 0 in domain A4E98618A3744717A65AF04E943D035A
=======================================

Any pointers would be much appreciated.

Thanks,

Vineeth
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20130531/aa67c269/attachment.html 


More information about the Ocfs2-users mailing list