[Ocfs2-users] OCFS2 DLM problems

Sunil Mushran Sunil.Mushran at oracle.com
Wed Jan 23 13:07:24 PST 2008


Depends on the net traffic I guess. The error returned asks the user
to retry and the older code wasn't. AFAIR, we have never encountered
this in our main test cluster.

Ulf Zimmermann wrote:
> Currently running 1.2.5-1 so we should upgrade. Is there any explanation
> how this bug gets triggered? We are trying to understand why we are
> suddenly hitting this bug, as this has been running for several months
> without being triggered.
>
> -----Original Message-----
> From: Sunil Mushran [mailto:Sunil.Mushran at oracle.com] 
> Sent: Wednesday, January 23, 2008 9:58 AM
> To: Ulf Zimmermann
> Cc: ocfs2-users at oss.oracle.com
> Subject: Re: [Ocfs2-users] OCFS2 DLM problems
>
> 1.2.5-what?
>
> If you are not on 1.2.5-6, upgrade to that. It could be you are hitting
> the
> following issue addressed in that release.
>
> r3033 tcp - Retry sendpage() if it returns EAGAIN (bugzilla#896)
>
> No, don't upgrade to 1.2.7. We just discovered an issue in it and will
> be releasing 1.2.8 shortly.
>
> Ulf Zimmermann wrote:
>   
>> Hello everyone, once again.
>>
>> We are running into a problem, which has shown now 2 times, possible 3
>> (once the systems looked different.)
>>
>> The environment is 6 HP DL360/380 g5 servers with eth0 being the
>>     
> public
>   
>> interface, eth1 and bond0 (eth2 and eth3) used for clusterware and
>>     
> bond0
>   
>> also used for OCFS2. The bond0 interface is in active/passive mode.
>> There are no network errors counters showing and even during the
>>     
> problem
>   
>> we can communicate via the bond0 interface. This setup has been
>>     
> running
>   
>> for more then 2 months but last Wednesday morning and today again, we
>> had 2 nodes causing locking problems. The problem starts with messages
>> like this:
>>
>> Jan 23 03:20:44 dbprd01 kernel: o2net: no longer connected to node
>> dbprd02 (num 1) at 192.168.202.2:7777
>> Jan 23 03:20:46 dbprd01 kernel: (5172,0):dlm_send_proxy_ast_msg:459
>> ERROR: status = -107
>> Jan 23 03:20:46 dbprd01 kernel: (5172,0):dlm_flush_asts:600 ERROR:
>> status = -107
>> Jan 23 03:20:46 dbprd01 kernel: (5172,0):dlm_send_proxy_ast_msg:459
>> ERROR: status = -107
>> Jan 23 03:20:46 dbprd01 kernel: (5172,0):dlm_flush_asts:600 ERROR:
>> status = -107
>>
>> Jan 23 03:20:44 dbprd02 kernel: (5096,0):o2net_sendpage:868 ERROR:
>> sendpage of size 24 to node dbprd01 (num 0) at 192.168.202.1:7777
>>     
> failed
>   
>> with -11
>> Jan 23 03:20:44 dbprd02 kernel: o2net: no longer connected to node
>> dbprd01 (num 0) at 192.168.202.1:7777
>>
>> After these there are plenty of more messages, such as
>> "dlm_wait_for_node_death", "dlm_send_remote_convert_request" on
>>     
> dbprd02
>   
>> and "dlm_send_proxy_ast_msg", "dlm_flush_asts" on dbprd01.
>>
>> We are currently running OCFS2 1.2.5, the kernel is EL4 Update 5
>>     
> x86_64
>   
>> (2.6.9-55.ELsmp).
>>
>> I see there is one bug fixed in 1.2.6/1.2.7 related to DLM and I was
>> wondering if the above problem could be related to it or if this is
>> something different.
>>
>>
>> Ulf Zimmermann | Senior System Architect
>>
>> ATC-Onlane, Inc.
>> 4600 Bohannon Drive, Suite 100
>> Menlo Park, CA 94025
>>
>> O: 650-532-6382  M: (510) 396-1764  F: (510) 580-0929
>>
>> Email: ulf at atc-onlane.com | Web: www.atc-onlane.com
>>
>> DISCLAIMER:
>> This e-mail and any attachments are confidential and also may be
>> privileged. If you are not the named recipient, or have otherwise
>> received this communication in error, please delete it from your
>>     
> inbox,
>   
>> notify the sender immediately, and do not disclose its contents to any
>> other person, use them for any purpose, or store or copy them in any
>> medium. Thank you for your cooperation.
>>
>>
>>
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>   
>>     
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>   




More information about the Ocfs2-users mailing list