[Ocfs2-users] Cluster lockup when one node fails

Sitansu Mohanty 2sitansu at gmail.com
Tue Jul 1 04:26:26 PDT 2014


OVM version is 3.2 and UEK 5.7 and hw platform is HP ProLiant DL980 G7
Server.




On Tue, Jul 1, 2014 at 4:52 PM, Sitansu Mohanty <2sitansu at gmail.com> wrote:

> Hi Srini,
>
> Thanks for your reply,  yes node is getting panic, some time it is pingig
> and sometime it is not
>
> Below is the log message attached for your reference.
> Jun 30 15:29:00 XXXXXXXXXXXX kernel:
> (kworker/u:1,23886,4):dlm_do_assert_master:1665 ERROR: Error -112 when
> sending message 502 (key 0xe1f6c89a) to node 0
> Jun 30 15:29:00 XXXXXXXXXXXX kernel:
> (python,30105,5):dlm_do_master_request:1332 ERROR: link to 0 went down!
> Jun 30 15:29:00 XXXXXXXXXXXX kernel:
> (python,30105,5):dlm_get_lock_resource:917 ERROR: status = -112
> Jun 30 15:29:00 XXXXXXXXXXXX kernel: o2net: Connected to node XXXXXXXXXXXX
> (num 0) at xxx.xx.xxx.xx:7777
> Jun 30 16:17:33 XXXXXXXXXXXX kernel: o2net: Connection to node
> XXXXXXXXXXXX (num 3) at xxx.xx.xxx.xx:7777 shutdown, state 8
> Jun 30 16:17:33 XXXXXXXXXXXX kernel: o2net: No longer connected to
> XXXXXXXXXXXX (num 3) at xxx.xx.xxx.xx:7777
> Jun 30 16:17:33 XXXXXXXXXXXX kernel: o2net: Accepted connection from
> XXXXXXXXXXXX (num 3) at xxx.xx.xxx.xx:7777
> Jun 30 16:35:22 XXXXXXXXXXXX kernel: o2net: Connection to node
> XXXXXXXXXXXX (num 0) at xxx.xx.xxx.xx:7777 has been idle for 60.123 secs,
> shutting it down.
> Jun 30 16:35:22 XXXXXXXXXXXX kernel: o2net: No longer connected to node
> XXXXXXXXXXXX (num 0) at xxx.xx.xxx.xx:7777
> Jun 30 16:35:22 XXXXXXXXXXXX kernel: o2net: Connected to node XXXXXXXXXXXX
> (num 0) at xxx.xx.xxx.xx:7777
>
>
> And below is the details attached for your reference.
> /etc/init.d/o2cb status
> Driver for "configfs": Loaded
> Filesystem "configfs": Mounted
> Stack glue driver: Loaded
> Stack plugin "o2cb": Loaded
> Driver for "ocfs2_dlmfs": Loaded
> Filesystem "ocfs2_dlmfs": Mounted
> Checking O2CB cluster "c2b7203fc6cacd2a": Online
>   Heartbeat dead threshold: 31
>   Network idle timeout: 60000
>   Network keepalive delay: 2000
>   Network reconnect delay: 2000
>   Heartbeat mode: Global
> Checking O2CB heartbeat: Active
>   0004FB0000050000DBA4FB2E32B64D87 /dev/dm-0
> Nodes in O2CB cluster: 0 1 2 3 4
> Active userdlm domains: ovm
>
> Regards,
> Sitansu
>
> On Tue, Jul 1, 2014 at 12:10 AM, Srinivas Eeda <srinivas.eeda at oracle.com>
> wrote:
>
>>  Can you please describe what you mean by "fails" ... did the node panic?
>> What are the timeout values ? What kernel version are you using ?
>>
>>
>> On 06/30/2014 10:54 AM, Sitansu Mohanty wrote:
>>
>>  Hi,
>> what is the answer to the below post, i am facing the same issue.
>> https://oss.oracle.com/pipermail/ocfs2-users/2009-May/003559.html
>>
>> Please help !
>>
>> --
>> Regards
>> Sitansu Prasad Mohanty
>> Mob:-09382226266
>> E-mail:2sitansu at gmail.com
>> Chennai-94
>>
>>
>> _______________________________________________
>> Ocfs2-users mailing listOcfs2-users at oss.oracle.comhttps://oss.oracle.com/mailman/listinfo/ocfs2-users
>>
>>
>>
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-users
>>
>
>
>
> --
> Regards
> Sitansu Prasad Mohanty
> Mob:-09382226266
> E-mail:2sitansu at gmail.com
> Chennai-94
>



-- 
Regards
Sitansu Prasad Mohanty
Mob:-09382226266
E-mail:2sitansu at gmail.com
Chennai-94
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20140701/221ca10e/attachment.html 


More information about the Ocfs2-users mailing list