[Ocfs2-users] OCFS2 DLM problems

Sunil Mushran Sunil.Mushran at oracle.com
Wed Jan 23 09:58:18 PST 2008


1.2.5-what?

If you are not on 1.2.5-6, upgrade to that. It could be you are hitting the
following issue addressed in that release.

r3033 tcp - Retry sendpage() if it returns EAGAIN (bugzilla#896)

No, don't upgrade to 1.2.7. We just discovered an issue in it and will
be releasing 1.2.8 shortly.

Ulf Zimmermann wrote:
> Hello everyone, once again.
>
> We are running into a problem, which has shown now 2 times, possible 3
> (once the systems looked different.)
>
> The environment is 6 HP DL360/380 g5 servers with eth0 being the public
> interface, eth1 and bond0 (eth2 and eth3) used for clusterware and bond0
> also used for OCFS2. The bond0 interface is in active/passive mode.
> There are no network errors counters showing and even during the problem
> we can communicate via the bond0 interface. This setup has been running
> for more then 2 months but last Wednesday morning and today again, we
> had 2 nodes causing locking problems. The problem starts with messages
> like this:
>
> Jan 23 03:20:44 dbprd01 kernel: o2net: no longer connected to node
> dbprd02 (num 1) at 192.168.202.2:7777
> Jan 23 03:20:46 dbprd01 kernel: (5172,0):dlm_send_proxy_ast_msg:459
> ERROR: status = -107
> Jan 23 03:20:46 dbprd01 kernel: (5172,0):dlm_flush_asts:600 ERROR:
> status = -107
> Jan 23 03:20:46 dbprd01 kernel: (5172,0):dlm_send_proxy_ast_msg:459
> ERROR: status = -107
> Jan 23 03:20:46 dbprd01 kernel: (5172,0):dlm_flush_asts:600 ERROR:
> status = -107
>
> Jan 23 03:20:44 dbprd02 kernel: (5096,0):o2net_sendpage:868 ERROR:
> sendpage of size 24 to node dbprd01 (num 0) at 192.168.202.1:7777 failed
> with -11
> Jan 23 03:20:44 dbprd02 kernel: o2net: no longer connected to node
> dbprd01 (num 0) at 192.168.202.1:7777
>
> After these there are plenty of more messages, such as
> "dlm_wait_for_node_death", "dlm_send_remote_convert_request" on dbprd02
> and "dlm_send_proxy_ast_msg", "dlm_flush_asts" on dbprd01.
>
> We are currently running OCFS2 1.2.5, the kernel is EL4 Update 5 x86_64
> (2.6.9-55.ELsmp).
>
> I see there is one bug fixed in 1.2.6/1.2.7 related to DLM and I was
> wondering if the above problem could be related to it or if this is
> something different.
>
>
> Ulf Zimmermann | Senior System Architect
>
> ATC-Onlane, Inc.
> 4600 Bohannon Drive, Suite 100
> Menlo Park, CA 94025
>
> O: 650-532-6382  M: (510) 396-1764  F: (510) 580-0929
>
> Email: ulf at atc-onlane.com | Web: www.atc-onlane.com
>
> DISCLAIMER:
> This e-mail and any attachments are confidential and also may be
> privileged. If you are not the named recipient, or have otherwise
> received this communication in error, please delete it from your inbox,
> notify the sender immediately, and do not disclose its contents to any
> other person, use them for any purpose, or store or copy them in any
> medium. Thank you for your cooperation.
>
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>   




More information about the Ocfs2-users mailing list