[Ocfs-users] OCFS adding/removing nodes

David McWhinnie davidmcwhinnie at yahoo.com
Thu Jan 6 13:24:08 CST 2005


We haven't touched anything software wise, and don't
see any hardware errors, and the errors only occur
during the dd command.  

The messages only occur on the partition I am running
dd on, and only on the node (2 node system) that is
NOT running dd.



--- Sunil Mushran <Sunil.Mushran at oracle.com> wrote:

> err.... any gremlins in your machine. :-)
> 
> Check the hardware. What I mean is that if you have
> not touched your software (as in kernel/modules) and
> you
> are getting the msgs even with no load on the
> system,
> I would concentrate on the disks/controller.
> 
> Are the messages highlighting one volume only or
> all?
> As in, is only one volume getting in and out of the
> cluster?
> Also, are the msgs only on one box or on all?
> 
> David McWhinnie wrote:
> 
> >Just started happening, there is no other activity
> on
> >the system.  Everything is shutdown.
> >
> >I was originally looking into a "no space left on
> >device issue" (lots of space left tho) while
> running
> >an RMAN restore....  Thought that was a
> fragmentation
> >issue, but reformated the partitions and the no
> space
> >error came back.  So at that point was trying a dd
> to
> >create a file and noticed the dd hanging along with
> >the messages about nodes being removed.
> >
> >But when the no space left error occurs, the node
> >removal messages don't appear.  
> >
> >David
> >
> >--- Sunil Mushran <Sunil.Mushran at oracle.com> wrote:
> >
> >  
> >
> >>No, dd is mostly unrelated.
> >>What those messages indicate that the heartbeat
> >>thread is not keeping up,
> >>which is not good. Yes, all the processes
> requiring
> >>a cluster lock will hang
> >>as a node is being evicted from the cluster.
> >>
> >>That you are getting these errors with as low as
> 16
> >>mounts is puzzling.
> >>
> >>When do these errors messages pop up? Any
> >>relationship with the load on
> >>the system?
> >>
> >>David McWhinnie wrote:
> >>
> >>    
> >>
> >>>We have 16 OCFS mounts.
> >>>We are running RedHat 2.1
> >>>OCFS version 1. Latest patch.
> >>>
> >>>Turns out a dd was being done without the
> o_direct
> >>>option.  So that could be the cause.  Interesting
> >>>      
> >>>
> >>this
> >>    
> >>
> >>>ls, df etc would all hang while the node was
> >>>      
> >>>
> >>removed
> >>>from the cluster.
> >>    
> >>
> >>>David
> >>>--- Sunil Mushran <Sunil.Mushran at oracle.com>
> wrote:
> >>>
> >>> 
> >>>
> >>>      
> >>>
> >>>>how many ocfs mounts do you have on that box?
> >>>>which kernel?
> >>>>ocfs version?
> >>>>
> >>>>David McWhinnie wrote:
> >>>>
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>>>We are getting the following messages.
> >>>>>Jan  6 10:19:25 houided006 kernel: ocfs: Adding
> >>>>>houided005 (node 0) to clustered device (8,208)
> >>>>>Jan  6 10:19:52 houided006 kernel: ocfs:
> Removing
> >>>>>houided005 (node 0) from clustered device
> (8,208)
> >>>>>
> >>>>>every few minutes.
> >>>>>
> >>>>>Any advice on how to troubleshoot this?
> >>>>>
> >>>>>David.
> >>>>>          
> >>>>>
> >
> >
> >
> >		
> >__________________________________ 
> >Do you Yahoo!? 
> >Yahoo! Mail - 250MB free storage. Do more. Manage
> less. 
> >http://info.mail.yahoo.com/mail_250
> >  
> >
> 



		
__________________________________ 
Do you Yahoo!? 
Meet the all-new My Yahoo! - Try it today! 
http://my.yahoo.com 
 



More information about the Ocfs-users mailing list