[Ocfs2-users] (no subject)

Andrew Phillips Andrew.Phillips at betfair.com
Tue Sep 2 01:50:17 PDT 2008


Sunil,

 Thanks for the response. During this,I spent a lot of time looking at this 
page;
  http://oss.oracle.com/osswiki/OCFS2/Debugging

 Which is where google told me to go for "ocfs2 lock debug". A 
short note saying that that information is old or applies to 1.2
would be helpful, along with a pointer to the 1.4 user guide. 

  Having read the 1.4 guide, there are a few more things to try.

  The guidance seems to be to kill the process thats holding the
locks. If the process holding the lock is a zombie, that becomes 
a bit hard to do. Is there any way of reaching into ocfs2 and 
telling it to break the lock manually, and we'll accept the consequences.

   Or alternatively, if you've rebooted the system that holds the lock
would the others reclaim locks held and carry on as normal?

   Andy



-----Original Message-----
From: Sunil Mushran [mailto:sunil.mushran at oracle.com]
Sent: Tue 02/09/2008 05:21
To: Andrew Phillips
Cc: ocfs2-users at oss.oracle.com; atp at tradefair.com
Subject: Re: [Ocfs2-users] (no subject)
 
So in 1.4, we have a much improved debugging infrastructure for
such issues. Check out the write on dlm debugging in the 1.4
user's guide in the chapter titled notes.

In short, you have correctly identified the lock resource. But we
need to go a step further and get the info from the dlm and see
as to which node is holding onto the lock and why.

Read the writeup and of you have any qs, ping me.

Sunil

Andrew Phillips wrote:
> Hello,
>
>  We just experienced a hang that looks superficially very similar to 
> http://www.mail-archive.com/ocfs2-users@oss.oracle.com/msg02359.html
>
>  There are 3 nodes in the cluster ocfs2-1.4.1 rhel 5.2. Versions, uname's
> in the attached text file which also includes fs_locks dumps and various
> other diagnostics. 
>
> The lock up happened when we were restarting a java application that 
> was writing to the /journal directory, being read by another java app
> on a second node.  Restarting the machine that the 
> jvm was running on did not help - indicating a locking issue. 
>
> ls of the directory hangs the process on the machine that was writing.
> An ls on the machine that was reading initially worked. An rm command
> on the reader then caused that to lock up as well. 
>
> Here's an extract showing what they're waiting on.
>
>  2222 D    bash            ocfs2_wait_for_mask
>  2282 Zl   java <defunct>  exit
>  2567 Zl   java <defunct>  exit
>  2736 D    ls              ocfs2_wait_for_mask
>  2770 D    ls              ocfs2_wait_for_mask
>
> Andy
>
>  
>
>
> ________________________________________________________________________
> In order to protect our email recipients, Betfair Group use SkyScan from 
> MessageLabs to scan all Incoming and Outgoing mail for viruses.
>
> ________________________________________________________________________
> ------------------------------------------------------------------------
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users



________________________________________________________________________
In order to protect our email recipients, Betfair Group use SkyScan from 
MessageLabs to scan all Incoming and Outgoing mail for viruses.

________________________________________________________________________



More information about the Ocfs2-users mailing list