[Ocfs2-users] OCFS2 1.4 Problem on SuSE

Sunil Mushran sunil.mushran at oracle.com
Mon Sep 28 11:35:28 PDT 2009


Ping Novell for issues on SLES10. The error suggests that you are
encountering novell bz#524683. This has been addressed in ocfs2 1.4.4.
Ping Novell for a PTF kernel with the fix.

Angelo McComis wrote:
>  Hello --
>
> We're running a handful of OCFS2 clusters on Novell SuSE SLES 10 SP2.
> We are in front of IBM SVC storage, and on HP Blade hardware via the
> QLA 2xxx HBAs.
>
> We have an application from IBM that makes use of files in this space
> in a grid style environment, and we are in the process of debugging
> some I/O issues and crashes, but while we do, I'm wondering if there
> is any good reference on what constitutes a solid starting point for
> tuning how many concurrent accesses to a directory are allowed, or if
> there are specific tunables that are outside the default we need.
>
>
> There are some strange errors that I can't decipher:
>
> Sep 25 12:11:30 host02 kernel: (4438,4):dlmunlock_common:128 ERROR:
> lockres F00000000000000003b1341b545b16f: Someone is calling dlmunlock
> while waiting for an ast!<3>(4438,4):dlmunlock:685 ERROR: dlm status =
> DLM_BADPARAM
>
> Sep 25 12:11:30 host02 kernel: (4438,4):ocfs2_cancel_convert:3092
> ERROR: Dlm error "DLM_BADPARAM" while calling dlmunlock on resource
> F00000000000000003b1341b545b16f: invalid lock mode specified
>
> The symptom of the problem is that file access to the mountpoint of
> ocfs2 space gets gradually slower and slower until the system just
> crashes / becomes unresponsive when trying to access files there, cd
> into the directory, etc.
>
> What we've done so far:
>
> - Checked our multipath configuration - seems to be showing all paths
> to our disks, none offline, none failed, etc.
> - Checked our lvm configuration - seems to be good as well.
> - Checked our HBA configuration -- made some changes in regards to
> retry and failover... but this change has made the behavior no better.
>
> Anyone can point me in the right direction or help me know what
> questions to even start asking here?
>
> The problem seems related to multiple / concurrent access to
> directories within an OCFS2 filesystem, and how DLM is behaving.
>
>
> Our OS ver/kernel is 2.6.16.60-0.42.5-smp  (Novell SLES10-sp2 + patches)
>
> Thanks in advance...
>
> Angelo
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>   




More information about the Ocfs2-users mailing list