[Ocfs-users] OCFS adding/removing nodes

Sunil Mushran Sunil.Mushran at oracle.com
Thu Jan 6 13:41:54 CST 2005


Appears the dd process is hindering hb thread's io. The reason you
see the errors only on the other node is because the other node notices
that the first node is missing enough hb updates to be evicted from
the cluster. So, that makes sense.... on one level.

Do the msgs pop up when running dd without the o_direct option
or is that unrelated?

David McWhinnie wrote:

>We haven't touched anything software wise, and don't
>see any hardware errors, and the errors only occur
>during the dd command.  
>
>The messages only occur on the partition I am running
>dd on, and only on the node (2 node system) that is
>NOT running dd.
>
>
>
>--- Sunil Mushran <Sunil.Mushran at oracle.com> wrote:
>
>  
>
>>err.... any gremlins in your machine. :-)
>>
>>Check the hardware. What I mean is that if you have
>>not touched your software (as in kernel/modules) and
>>you
>>are getting the msgs even with no load on the
>>system,
>>I would concentrate on the disks/controller.
>>
>>Are the messages highlighting one volume only or
>>all?
>>As in, is only one volume getting in and out of the
>>cluster?
>>Also, are the msgs only on one box or on all?
>>
>>David McWhinnie wrote:
>>
>>    
>>
>>>Just started happening, there is no other activity
>>>      
>>>
>>on
>>    
>>
>>>the system.  Everything is shutdown.
>>>
>>>I was originally looking into a "no space left on
>>>device issue" (lots of space left tho) while
>>>      
>>>
>>running
>>    
>>
>>>an RMAN restore....  Thought that was a
>>>      
>>>
>>fragmentation
>>    
>>
>>>issue, but reformated the partitions and the no
>>>      
>>>
>>space
>>    
>>
>>>error came back.  So at that point was trying a dd
>>>      
>>>
>>to
>>    
>>
>>>create a file and noticed the dd hanging along with
>>>the messages about nodes being removed.
>>>
>>>But when the no space left error occurs, the node
>>>removal messages don't appear.  
>>>
>>>David
>>>
>>>--- Sunil Mushran <Sunil.Mushran at oracle.com> wrote:
>>>
>>> 
>>>
>>>      
>>>
>>>>No, dd is mostly unrelated.
>>>>What those messages indicate that the heartbeat
>>>>thread is not keeping up,
>>>>which is not good. Yes, all the processes
>>>>        
>>>>
>>requiring
>>    
>>
>>>>a cluster lock will hang
>>>>as a node is being evicted from the cluster.
>>>>
>>>>That you are getting these errors with as low as
>>>>        
>>>>
>>16
>>    
>>
>>>>mounts is puzzling.
>>>>
>>>>When do these errors messages pop up? Any
>>>>relationship with the load on
>>>>the system?
>>>>
>>>>David McWhinnie wrote:
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>We have 16 OCFS mounts.
>>>>>We are running RedHat 2.1
>>>>>OCFS version 1. Latest patch.
>>>>>
>>>>>Turns out a dd was being done without the
>>>>>          
>>>>>
>>o_direct
>>    
>>
>>>>>option.  So that could be the cause.  Interesting
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>this
>>>>   
>>>>
>>>>        
>>>>
>>>>>ls, df etc would all hang while the node was
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>removed
>>>>>from the cluster.
>>>>   
>>>>
>>>>        
>>>>
>>>>>David
>>>>>--- Sunil Mushran <Sunil.Mushran at oracle.com>
>>>>>          
>>>>>
>>wrote:
>>    
>>
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>how many ocfs mounts do you have on that box?
>>>>>>which kernel?
>>>>>>ocfs version?
>>>>>>
>>>>>>David McWhinnie wrote:
>>>>>>
>>>>>>  
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>We are getting the following messages.
>>>>>>>Jan  6 10:19:25 houided006 kernel: ocfs: Adding
>>>>>>>houided005 (node 0) to clustered device (8,208)
>>>>>>>Jan  6 10:19:52 houided006 kernel: ocfs:
>>>>>>>              
>>>>>>>
>>Removing
>>    
>>
>>>>>>>houided005 (node 0) from clustered device
>>>>>>>              
>>>>>>>
>>(8,208)
>>    
>>
>>>>>>>every few minutes.
>>>>>>>
>>>>>>>Any advice on how to troubleshoot this?
>>>>>>>
>>>>>>>David.
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>
>>>		
>>>__________________________________ 
>>>Do you Yahoo!? 
>>>Yahoo! Mail - 250MB free storage. Do more. Manage
>>>      
>>>
>>less. 
>>    
>>
>>>http://info.mail.yahoo.com/mail_250
>>> 
>>>
>>>      
>>>
>
>
>
>		
>__________________________________ 
>Do you Yahoo!? 
>Meet the all-new My Yahoo! - Try it today! 
>http://my.yahoo.com 
> 
>
>  
>


More information about the Ocfs-users mailing list