[Ocfs-users] OCFS adding/removing nodes

David McWhinnie davidmcwhinnie at yahoo.com
Thu Jan 6 14:47:21 CST 2005


It seems we don't have the o_direct dd installed, but
we do have the cp o_direct installed.  I'm checking
with the OS guys on that.

I'm not real concerned about the dd issue, we don't
actually use it for anything.  I was just using it to
troubleshoot my RMAN issue.  Support is now saying my
RMAN issue is an OCFS issue, so you guys may be
hearing about that on your side.

Thanks,
David
--- Sunil Mushran <Sunil.Mushran at oracle.com> wrote:

> Appears the dd process is hindering hb thread's io.
> The reason you
> see the errors only on the other node is because the
> other node notices
> that the first node is missing enough hb updates to
> be evicted from
> the cluster. So, that makes sense.... on one level.
> 
> Do the msgs pop up when running dd without the
> o_direct option
> or is that unrelated?
> 
> David McWhinnie wrote:
> 
> >We haven't touched anything software wise, and
> don't
> >see any hardware errors, and the errors only occur
> >during the dd command.  
> >
> >The messages only occur on the partition I am
> running
> >dd on, and only on the node (2 node system) that is
> >NOT running dd.
> >
> >
> >
> >--- Sunil Mushran <Sunil.Mushran at oracle.com> wrote:
> >
> >  
> >
> >>err.... any gremlins in your machine. :-)
> >>
> >>Check the hardware. What I mean is that if you
> have
> >>not touched your software (as in kernel/modules)
> and
> >>you
> >>are getting the msgs even with no load on the
> >>system,
> >>I would concentrate on the disks/controller.
> >>
> >>Are the messages highlighting one volume only or
> >>all?
> >>As in, is only one volume getting in and out of
> the
> >>cluster?
> >>Also, are the msgs only on one box or on all?
> >>
> >>David McWhinnie wrote:
> >>
> >>    
> >>
> >>>Just started happening, there is no other
> activity
> >>>      
> >>>
> >>on
> >>    
> >>
> >>>the system.  Everything is shutdown.
> >>>
> >>>I was originally looking into a "no space left on
> >>>device issue" (lots of space left tho) while
> >>>      
> >>>
> >>running
> >>    
> >>
> >>>an RMAN restore....  Thought that was a
> >>>      
> >>>
> >>fragmentation
> >>    
> >>
> >>>issue, but reformated the partitions and the no
> >>>      
> >>>
> >>space
> >>    
> >>
> >>>error came back.  So at that point was trying a
> dd
> >>>      
> >>>
> >>to
> >>    
> >>
> >>>create a file and noticed the dd hanging along
> with
> >>>the messages about nodes being removed.
> >>>
> >>>But when the no space left error occurs, the node
> >>>removal messages don't appear.  
> >>>
> >>>David
> >>>
> >>>--- Sunil Mushran <Sunil.Mushran at oracle.com>
> wrote:
> >>>
> >>> 
> >>>
> >>>      
> >>>
> >>>>No, dd is mostly unrelated.
> >>>>What those messages indicate that the heartbeat
> >>>>thread is not keeping up,
> >>>>which is not good. Yes, all the processes
> >>>>        
> >>>>
> >>requiring
> >>    
> >>
> >>>>a cluster lock will hang
> >>>>as a node is being evicted from the cluster.
> >>>>
> >>>>That you are getting these errors with as low as
> >>>>        
> >>>>
> >>16
> >>    
> >>
> >>>>mounts is puzzling.
> >>>>
> >>>>When do these errors messages pop up? Any
> >>>>relationship with the load on
> >>>>the system?
> >>>>
> >>>>David McWhinnie wrote:
> >>>>
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>>>We have 16 OCFS mounts.
> >>>>>We are running RedHat 2.1
> >>>>>OCFS version 1. Latest patch.
> >>>>>
> >>>>>Turns out a dd was being done without the
> >>>>>          
> >>>>>
> >>o_direct
> >>    
> >>
> >>>>>option.  So that could be the cause. 
> Interesting
> >>>>>     
> >>>>>
> >>>>>          
> >>>>>
> >>>>this
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>>>ls, df etc would all hang while the node was
> >>>>>     
> >>>>>
> >>>>>          
> >>>>>
> >>>>removed
> >>>>>from the cluster.
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>>>David
> >>>>>--- Sunil Mushran <Sunil.Mushran at oracle.com>
> >>>>>          
> >>>>>
> >>wrote:
> >>    
> >>
> >>>>>
> >>>>>     
> >>>>>
> >>>>>          
> >>>>>
> >>>>>>how many ocfs mounts do you have on that box?
> >>>>>>which kernel?
> >>>>>>ocfs version?
> >>>>>>
> >>>>>>David McWhinnie wrote:
> >>>>>>
> >>>>>>  
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>            
> >>>>>>
> >>>>>>>We are getting the following messages.
> >>>>>>>Jan  6 10:19:25 houided006 kernel: ocfs:
> Adding
> >>>>>>>houided005 (node 0) to clustered device
> (8,208)
> >>>>>>>Jan  6 10:19:52 houided006 kernel: ocfs:
> >>>>>>>              
> >>>>>>>
> >>Removing
> >>    
> >>
> >>>>>>>houided005 (node 0) from clustered device
> 
=== message truncated ===



		
__________________________________ 
Do you Yahoo!? 
The all-new My Yahoo! - What will yours do?
http://my.yahoo.com 


More information about the Ocfs-users mailing list