[Ocfs-users] OCFS adding/removing nodes

Sunil Mushran Sunil.Mushran at oracle.com
Thu Jan 6 13:20:41 CST 2005


err.... any gremlins in your machine. :-)

Check the hardware. What I mean is that if you have
not touched your software (as in kernel/modules) and you
are getting the msgs even with no load on the system,
I would concentrate on the disks/controller.

Are the messages highlighting one volume only or all?
As in, is only one volume getting in and out of the cluster?
Also, are the msgs only on one box or on all?

David McWhinnie wrote:

>Just started happening, there is no other activity on
>the system.  Everything is shutdown.
>
>I was originally looking into a "no space left on
>device issue" (lots of space left tho) while running
>an RMAN restore....  Thought that was a fragmentation
>issue, but reformated the partitions and the no space
>error came back.  So at that point was trying a dd to
>create a file and noticed the dd hanging along with
>the messages about nodes being removed.
>
>But when the no space left error occurs, the node
>removal messages don't appear.  
>
>David
>
>--- Sunil Mushran <Sunil.Mushran at oracle.com> wrote:
>
>  
>
>>No, dd is mostly unrelated.
>>What those messages indicate that the heartbeat
>>thread is not keeping up,
>>which is not good. Yes, all the processes requiring
>>a cluster lock will hang
>>as a node is being evicted from the cluster.
>>
>>That you are getting these errors with as low as 16
>>mounts is puzzling.
>>
>>When do these errors messages pop up? Any
>>relationship with the load on
>>the system?
>>
>>David McWhinnie wrote:
>>
>>    
>>
>>>We have 16 OCFS mounts.
>>>We are running RedHat 2.1
>>>OCFS version 1. Latest patch.
>>>
>>>Turns out a dd was being done without the o_direct
>>>option.  So that could be the cause.  Interesting
>>>      
>>>
>>this
>>    
>>
>>>ls, df etc would all hang while the node was
>>>      
>>>
>>removed
>>>from the cluster.
>>    
>>
>>>David
>>>--- Sunil Mushran <Sunil.Mushran at oracle.com> wrote:
>>>
>>> 
>>>
>>>      
>>>
>>>>how many ocfs mounts do you have on that box?
>>>>which kernel?
>>>>ocfs version?
>>>>
>>>>David McWhinnie wrote:
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>We are getting the following messages.
>>>>>Jan  6 10:19:25 houided006 kernel: ocfs: Adding
>>>>>houided005 (node 0) to clustered device (8,208)
>>>>>Jan  6 10:19:52 houided006 kernel: ocfs: Removing
>>>>>houided005 (node 0) from clustered device (8,208)
>>>>>
>>>>>every few minutes.
>>>>>
>>>>>Any advice on how to troubleshoot this?
>>>>>
>>>>>David.
>>>>>          
>>>>>
>
>
>
>		
>__________________________________ 
>Do you Yahoo!? 
>Yahoo! Mail - 250MB free storage. Do more. Manage less. 
>http://info.mail.yahoo.com/mail_250
>  
>


More information about the Ocfs-users mailing list