[Ocfs2-users] ESX 3.5 DRS and OCFS2 1.4.1-1

Sunil Mushran sunil.mushran at oracle.com
Fri Dec 5 13:43:48 PST 2008


That should not be a problem. It simply means that the mount
thread found a dirty slot that it cleaned (replay journal, etc).

However, you should not see this during vm migration as the
guests should have the volume mounted during the process.
Or atleast that's how it works with ovm/xen migration.

David Murphy wrote:
>> (4262,0):ocfs2_find_slot:268 slot 2 is already allocated to this node!
>>     
>
> Rather than ID maybe I should have  side Node Slot. 
>
>
> Also:
>
> [root at web2 ~]# dpkg -l | grep ocfs 
> ii  ocfs2-tools                            1.4.1-1                     tools
> for managing OCFS2 cluster filesystems
>
>
> David
>
>
> -----Original Message-----
> From: ocfs2-users-bounces at oss.oracle.com
> [mailto:ocfs2-users-bounces at oss.oracle.com] On Behalf Of Sunil Mushran
> Sent: Friday, December 05, 2008 12:56 PM
> To: David Murphy
> Cc: ocfs2-users at oss.oracle.com
> Subject: Re: [Ocfs2-users] ESX 3.5 DRS and OCFS2 1.4.1-1
>
> What ID are you referring to? ip address? Hope not.
>
> BTW, this is not 1.4.1.
>
> David Murphy wrote:
>   
>> We are getting:
>>
>> Dec 4 17:19:41 web2 kernel: [9724159.177875] EXT2-fs warning: mounting 
>> unchecked fs, running e2fsck is recommended
>>
>> Dec 4 17:19:41 web2 kernel: [9724159.463691] VMware hgfs: HGFS is 
>> disabled in the host
>>
>> Dec 4 17:19:41 web2 kernel: [9724160.965637] OCFS2 Node Manager 1.3.3
>>
>> Dec 4 17:19:41 web2 kernel: [9724161.033122] OCFS2 DLM 1.3.3
>>
>> Dec 4 17:19:41 web2 kernel: [9724161.037686] OCFS2 DLMFS 1.3.3
>>
>> Dec 4 17:19:41 web2 kernel: [9724161.038842] OCFS2 User DLM kernel 
>> interface loaded
>>
>> Dec 4 17:19:41 web2 kernel: [9724171.616652] o2net: accepted 
>> connection from node rgapp1 (num 4) at 192.168.102.11:7777
>>
>> Dec 4 17:19:41 web2 kernel: [9724171.722162] OCFS2 1.3.3
>>
>> Dec 4 17:19:41 web2 kernel: [9724171.782112] ocfs2_dlm: Nodes in 
>> domain ("7D876A4B2EE14D0C8E1181E8DCF4237B"): 2
>>
>> Dec 4 17:19:41 web2 kernel: [9724171.782345] ocfs2_dlm: Node 4 joins 
>> domain 7D876A4B2EE14D0C8E1181E8DCF4237B
>>
>> Dec 4 17:19:41 web2 kernel: [9724171.782348] ocfs2_dlm: Nodes in 
>> domain ("7D876A4B2EE14D0C8E1181E8DCF4237B"): 2 4
>>
>> Dec 4 17:19:41 web2 kernel: [9724171.782758] 
>> (4262,0):ocfs2_find_slot:268 slot 2 is already allocated to this node!
>>
>> Dec 4 17:19:41 web2 kernel: [9724171.841264] 
>> (4262,0):ocfs2_check_volume:1662 File system was not unmounted 
>> cleanly, recovering volume.
>>
>> Dec 4 17:19:41 web2 kernel: [9724171.841830] kjournald starting. 
>> Commit interval 5 seconds
>>
>> Dec 4 17:19:41 web2 kernel: [9724171.880229] ocfs2: Mounting device 
>> (8,17) on (node 2, slot 2) with ordered data mode.
>>
>> Dec 4 17:19:43 web2 kernel: [9724175.991919] o2net: accepted 
>> connection from node app1 (num 6) at 192.168.102.10:7777
>>
>> Dec 4 17:19:45 web2 kernel: [9724178.086781] VMware memory control 
>> driver initialized
>>
>> Dec 4 17:19:46 web2 kernel: [9724178.235647] o2net: accepted 
>> connection from node deploy (num 5) at 192.168.102.12:7777
>>
>> Dec 4 17:19:50 web2 kernel: [9724182.319762] ocfs2_dlm: Node 6 joins 
>> domain 7D876A4B2EE14D0C8E1181E8DCF4237B
>>
>> Dec 4 17:19:50 web2 kernel: [9724182.319773] ocfs2_dlm: Nodes in 
>> domain ("7D876A4B2EE14D0C8E1181E8DCF4237B"): 2 4 6
>>
>> Dec 4 17:19:50 web2 kernel: [9724182.598848] ocfs2_dlm: Node 5 joins 
>> domain 7D876A4B2EE14D0C8E1181E8DCF4237B
>>
>> Dec 4 17:19:50 web2 kernel: [9724182.598853] ocfs2_dlm: Nodes in 
>> domain ("7D876A4B2EE14D0C8E1181E8DCF4237B"): 2 4 5 6
>>
>> Dec 4 17:21:32 web2 syslogd 1.5.0#1ubuntu1: restart.
>>
>> This completely froze the entire cluster, when ESX tried to v-motion 3 
>> of 6 nodes to a new host.
>>
>> Is it recommended by Oracle not to enable DRS on virtual machine using 
>> the cluster, or is there a configuration we can use to keep crashes 
>> like this from happening all the time.
>>
>> I have seen several posts suggesting that disabling DRS would be a 
>> "way to workaround" this issue but not really a good practice as you 
>> would loose a lot of your HA abilities.
>>
>> Also is there a way to have OCFS2 drop a node from the cluster if a 
>> new node comes online with its ID?
>>
>> David Murphy
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>     
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>
>   




More information about the Ocfs2-users mailing list