[Ocfs2-users] OCFS2 1.4 Problem on SuSE

Angelo McComis angelo at mccomis.com
Mon Sep 28 11:15:43 PDT 2009


 Hello --

We're running a handful of OCFS2 clusters on Novell SuSE SLES 10 SP2.
We are in front of IBM SVC storage, and on HP Blade hardware via the
QLA 2xxx HBAs.

We have an application from IBM that makes use of files in this space
in a grid style environment, and we are in the process of debugging
some I/O issues and crashes, but while we do, I'm wondering if there
is any good reference on what constitutes a solid starting point for
tuning how many concurrent accesses to a directory are allowed, or if
there are specific tunables that are outside the default we need.


There are some strange errors that I can't decipher:

Sep 25 12:11:30 host02 kernel: (4438,4):dlmunlock_common:128 ERROR:
lockres F00000000000000003b1341b545b16f: Someone is calling dlmunlock
while waiting for an ast!<3>(4438,4):dlmunlock:685 ERROR: dlm status =
DLM_BADPARAM

Sep 25 12:11:30 host02 kernel: (4438,4):ocfs2_cancel_convert:3092
ERROR: Dlm error "DLM_BADPARAM" while calling dlmunlock on resource
F00000000000000003b1341b545b16f: invalid lock mode specified

The symptom of the problem is that file access to the mountpoint of
ocfs2 space gets gradually slower and slower until the system just
crashes / becomes unresponsive when trying to access files there, cd
into the directory, etc.

What we've done so far:

- Checked our multipath configuration - seems to be showing all paths
to our disks, none offline, none failed, etc.
- Checked our lvm configuration - seems to be good as well.
- Checked our HBA configuration -- made some changes in regards to
retry and failover... but this change has made the behavior no better.

Anyone can point me in the right direction or help me know what
questions to even start asking here?

The problem seems related to multiple / concurrent access to
directories within an OCFS2 filesystem, and how DLM is behaving.


Our OS ver/kernel is 2.6.16.60-0.42.5-smp  (Novell SLES10-sp2 + patches)

Thanks in advance...

Angelo



More information about the Ocfs2-users mailing list