[Ocfs2-users] NFS Failover

Wed Dec 10 06:27:22 PST 2008

   I found some IBM papers on GPFS that indicates they have a working solution for this.

   They seem to use a standard feature of the NFS client, NFS lock recovery, but the server was modified to initiate this process when one of the nodes die, and also to maintain lock coeherence.

   I am pasting text from the one of the papers below:

http://www.redbooks.ibm.com/redpapers/pdfs/redp4400.pdf

The following system prerequisites must be met before you begin the installation and configuration:

 A Linux 2.6 kernel

Distributions currently supported are Red Hat Enterprise Linux (RHEL) versions 4 and 5 and SUSE Linux Enterprise Server (SLES) versions 9 and 10.

 Operating system patches

– If NLM locking is required, a kernel patch that updates the lockd daemon to propagate locks to the clustered file system must be applied. This patch is currently available at:

http://sourceforge.net/tracker/?atid=719124&group_id=130828&func=browse/

Depending on the version of SLES you are using, this patch might exist partially. If this condition exists, you might need to resolve certain conflicts. Contact your support organization if necessary.

– To permit NFS clients to reclaim their locks with a new server after failover, the reclaim message from statd must appear to come from the IP address of the failing node (and not the node that took over, which is the one that will actually send the message).

On SUSE, statd runs in the kernel and does not implement the interface to support this requirement (notification only, -N option). Therefore, on SUSE, the common NFS utilities (sm-notify in the user space) are needed to implement this function.

The patches required for the util-linux package are:

• Support statd notification by name (patch-10113)
http://support.novell.com/techcenter/psdb/2c7941abcdf7a155ecb86b309245e468.html

• Specify a host name for the -v option (patch-10852)
http://support.novell.com/techcenter/psdb/e6a5a6d9614d9475759cc0cd033571e8.html

• Allow selection of IP source address on command line (patch-9617)
http://support.novell.com/techcenter/psdb/c11e14914101b2debe30f242448e1f5d.html/

– For RHEL, use of nfs-utils 1.0.7 is required for rpc.statd fixes. See:
http://www.redhat.com/

   They also use load balancing using DNS round robin (?!), not sure how they can make this work.

Regards,
Luis

--- On Tue, 12/9/08, Sunil Mushran <sunil.mushran at oracle.com> wrote:

> From: Sunil Mushran <sunil.mushran at oracle.com>
> Subject: Re: [Ocfs2-users] NFS Failover
> To: lfreitas34 at yahoo.com
> Cc: ocfs2-users at oss.oracle.com
> Date: Tuesday, December 9, 2008, 5:09 PM
> I forgot about fsid. That's how it identifies the
> device. Yes, it needs
> to be the same.
> 
> Yes, the inode numbers are consistent. It is the block
> number
> of the inode on disk.
> 
> Afraid cannot help you with failover lockd.
> 
> Sunil
> 
> Luis Freitas wrote:
> > Sunil,
> >
> >    They are not waiting, the kernel reconnects after a
> few seconds, but just dont like the other nfs server, any
> attempt to access directories or files after the virtual IP
> failover to the other nfs server was resulting in errors.
> Unfortunatelly I dont have the exact error message here
> anymore.
> >
> >    We found a parameter on the nfs server that seems
> to fix it, fsid. If you set this to the same number on both
> servers it forces both of them to use the same identifiers.
> Seems that if you dont, you need to guarantee that the mount
> is done on the same device on both servers, and we cannot do
> this since we are using powerpath. 
> >
> >   I would like to confirm if the inode numbers are
> consistent accross servers?
> >
> >   That is:
> >
> > [oracle at br001sv0440 concurrents]$ ls -il
> > total 8
> > 131545 drwxr-xr-x  2  100 users 4096 Dec  9 12:12
> admin
> > 131543 drwxrwxrwx  2 root dba   4096 Dec  4 08:53
> lost+found
> > [oracle at br001sv0440 concurrents]$
> >
> >    Directory "admin" (Or other
> directories/files) is always be inode number 131545, no
> mater on what server we are? Seems to be so, but I would
> like to confirm.
> >
> >
> >    About the metadata changes, this share will be used
> for log files (Actually, for a Oracle eBusiness Suite
> concurrent log and output files), so we can tolerate if a
> few of the latest files are lost during the failover. The
> user can simply run his report again. Also if some processes
> hang or die during the failover it can be tolerated, as the
> internal manager can restart them. Preferably processes
> should die instead of hanging.
> >
> >    But I am concerned about dangling locks on the
> server. Not sure on how to handle those. On the NFS-HA docs
> some files on /var/lib/nfs are copied using scripts every
> few seconds, but this does not seem to be a foolprof way. 
> >
> >    I overviewed the docs from NFS-HA sent to the list,
> they are usefull, but also very "Linux HA"
> centric, and require the heartbeat2 package. I wont install
> another cluster stack, since I already have CRS here. 
> >
> >    Do anyone has pointers on a similar setup with CRS?
> >
> > Best Regards,
> > Luis
> >