[Ocfs2-users] processes in "D" State

Mon Apr 19 11:28:07 PDT 2010

RO Holders: 1  EX Holders: 0

So node 18 wants to upgrade to EX. For that to happen,
node 17 has to downgrade from PR. But it cannot because
there is 1 RO (readonly) holder. If you are using NFS and
see a nfsd in a D state, then that would be it. I've just
released 1.4.7 in which this issue has been addressed.

Sunil

Brad Plant wrote:
> Hi Sunil,
>
> I managed to collect the fs_locks and dlm_locks output on both nodes this time. www1 is node 17 while www2 is node 18. I had to reboot www1 to fix the problem but of course www1 couldn't unmount the file system so the other nodes saw it as a crash.
>
> Both nodes are running 2.6.18-164.15.1.el5.centos.plusxen with the matching ocfs2 1.4.4-1 rpm downloaded from http://oss.oracle.com/projects/ocfs2/files/RedHat/RHEL5/x86_64/.
>
> Do you make anything of this?
>
> I read that there is going to be a new ocfs2 release soon. I'm sure there's lots of bug fixes, but are there any in there that you think might solve this problem?
>
> Cheers,
>
> Brad
>
>
> www2 ~ # ./scanlocks2 
> /dev/xvdd3 M0000000000000000095a0300000000
>
> www2 ~ # debugfs.ocfs2 -R "fs_locks  M0000000000000000095a0300000000" /dev/xvdd3 |cat
> Lockres: M0000000000000000095a0300000000  Mode: Protected Read
> Flags: Initialized Attached Busy
> RO Holders: 0  EX Holders: 0
> Pending Action: Convert  Pending Unlock Action: None
> Requested Mode: Exclusive  Blocking Mode: No Lock
> PR > Gets: 6802  Fails: 0    Waits (usec) Total: 0  Max: 0
> EX > Gets: 16340  Fails: 0    Waits (usec) Total: 12000  Max: 8000
> Disk Refreshes: 0
>
> www2 ~ # debugfs.ocfs2 -R "dlm_locks  M0000000000000000095a0300000000" /dev/xvdd3 |cat
> Lockres: M0000000000000000095a0300000000   Owner: 18   State: 0x0 
> Last Used: 0      ASTs Reserved: 0    Inflight: 0    Migration Pending: No
> Refs: 4    Locks: 2    On Lists: None
> Reference Map: 17 
>  Lock-Queue  Node  Level  Conv  Cookie           Refs  AST  BAST  Pending-Action
>  Granted     17    PR     -1    17:62487955      2     No   No    None
>  Converting  18    PR     EX    18:6599867       2     No   No    None
>
>
> www1 ~ # ./scanlocks2 
>
> www1 ~ # debugfs.ocfs2 -R "fs_locks  M0000000000000000095a0300000000" /dev/xvdd3 |cat
> Lockres: M0000000000000000095a0300000000  Mode: Protected Read
> Flags: Initialized Attached Blocked Queued
> RO Holders: 1  EX Holders: 0
> Pending Action: None  Pending Unlock Action: None
> Requested Mode: Protected Read  Blocking Mode: Exclusive
> PR > Gets: 110  Fails: 3    Waits (usec) Total: 32000  Max: 12000
> EX > Gets: 0  Fails: 0    Waits (usec) Total: 0  Max: 0
> Disk Refreshes: 0
>
> www1 ~ # debugfs.ocfs2 -R "dlm_locks  M0000000000000000095a0300000000" /dev/xvdd3 |cat
> Lockres: M0000000000000000095a0300000000   Owner: 18   State: 0x0 
> Last Used: 0      ASTs Reserved: 0    Inflight: 0    Migration Pending: No
> Refs: 3    Locks: 1    On Lists: None
> Reference Map: 
>  Lock-Queue  Node  Level  Conv  Cookie           Refs  AST  BAST  Pending-Action
>  Granted     17    PR     -1    17:62487955      2     No   No    None
>
>
>
>
>
>
>
> On Fri, 19 Mar 2010 08:48:39 -0700
> Sunil Mushran <sunil.mushran at oracle.com> wrote:
>
>   
>> In findpath <lockname>, the lockname needs to be in angular brackets.
>>
>> Did you manage to trap the oops stack trace of the crash?
>>
>> So the dlm on the master says that node 250 has a PR but the fslocks
>> on 250 says that it has requested a PR but not gotten a reply back as yet.
>> Next time also dump the dlm_lock on 250. (The message flow is fs on 250
>> talsk to dlm on 250 which talkd to dlm on master which may have to talk
>> to other nodes but eventually replies to dlm on 250 which then pings the
>> fs on that node. Roundtrip happens in a couple hundred of usecs in gige.)
>>
>> Running a mix of localflock and not is not advisable. Not the end of the 
>> world
>> though. It depends on how flocks are being used.
>>
>> Is this a mix of virtual and physical boxes?
>>
>> Brad Plant wrote:
>>     
>>> Hi Sunil,
>>>
>>> I seem to have struck this issue, also I'm not using nfs. I've got other processes stuck in the D stat. It's a mail server and the processes are postfix and courier-imap. As per your instructions, I've run scanlocks2, and debugfs.ocfs2:
>>>
>>> mail1 ~ # ./scanlocks2 
>>> /dev/xvdc1 M0000000000000000808bc800000000
>>>
>>> mail1 ~ # debugfs.ocfs2 -R "fs_locks -l M0000000000000000808bc800000000" /dev/xvdc1 |cat
>>> Lockres: M0000000000000000808bc800000000  Mode: Protected Read
>>> Flags: Initialized Attached Busy
>>> RO Holders: 0  EX Holders: 0
>>> Pending Action: Convert  Pending Unlock Action: None
>>> Requested Mode: Exclusive  Blocking Mode: No Lock
>>> Raw LVB:	05 00 00 00 00 00 00 01 00 00 01 99 00 00 01 99 
>>> 		12 1f c9 67 29 71 32 86 12 e8 e2 f6 d1 07 8c 15 
>>> 		12 e8 e2 f6 d1 07 8c 15 00 00 00 00 00 00 10 00 
>>> 		41 c0 00 05 00 00 00 00 4b b6 12 7d 00 00 00 00 
>>> PR > Gets: 471598  Fails: 0    Waits (usec) Total: 64002  Max: 8000
>>> EX > Gets: 8041  Fails: 0    Waits (usec) Total: 28001  Max: 4000
>>> Disk Refreshes: 0
>>>
>>> mail1 ~ # debugfs.ocfs2 -R "dlm_locks -l M0000000000000000808bc800000000" /dev/xvdc1 |cat
>>> Lockres: M0000000000000000808bc800000000   Owner: 1    State: 0x0 
>>> Last Used: 0      ASTs Reserved: 0    Inflight: 0    Migration Pending: No
>>> Refs: 4    Locks: 2    On Lists: None
>>> Reference Map: 250 
>>> Raw LVB:	05 00 00 00 00 00 00 01 00 00 01 99 00 00 01 99 
>>> 		12 1f c9 67 29 71 32 86 12 e8 e2 f6 d1 07 8c 15 
>>> 		12 e8 e2 f6 d1 07 8c 15 00 00 00 00 00 00 10 00 
>>> 		41 c0 00 05 00 00 00 00 4b b6 12 7d 00 00 00 00 
>>>  Lock-Queue  Node  Level  Conv  Cookie           Refs  AST  BAST  Pending-Action
>>>  Granted     250   PR     -1    250:10866405     2     No   No    None
>>>  Converting  1     PR     EX    1:95             2     No   No    None
>>>
>>> mail1 *is* node number 1, so this is the master node.
>>>
>>> I managed to run scanlocks2 on node 250 (backup1) and also managed to get the following:
>>>
>>> backup1 ~ # debugfs.ocfs2 -R "fs_locks -l M00000000000000007e89e400000000" /dev/xvdc1 |cat
>>> Lockres: M00000000000000007e89e400000000  Mode: Invalid
>>> Flags: Initialized Busy
>>> RO Holders: 0  EX Holders: 0
>>> Pending Action: Attach  Pending Unlock Action: None
>>> Requested Mode: Protected Read  Blocking Mode: Invalid
>>> Raw LVB:	00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>>> 		00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>>> 		00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>>> 		00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>>> PR > Gets: 0  Fails: 0    Waits (usec) Total: 0  Max: 0
>>> EX > Gets: 0  Fails: 0    Waits (usec) Total: 0  Max: 0
>>> Disk Refreshes: 0
>>>
>>> A further run of scanlocks2 however resulted in backup1 (node 250) crashing.
>>>
>>> The FS is mounted by 3 nodes: mail1, mail2 and backup1. mail1 and mail2 are running the latest centos 5 xen kernel with NO localflocks. backup1 is running a 2.6.28.10 vanilla mainline kernel (pv-ops) WITH localflocks.
>>>
>>> I had to switch backup1 to a mainline kernel with localflocks because performing backups on backup1 using rsync seemed to take a long time (3-4 times longer) when using the centos 5 xen kernel with no localflocks. I was running all nodes on recent-ish mainline kernels, but have only recently converted most of them to centos 5 because of repeated ocfs2 stability issues with mainline kernels and ocfs2.
>>>
>>> When backup1 crashed, the lock held by mail1 seemed to be released and everything went back to normal.
>>>
>>> I tried to do a debugfs.ocfs2 -R "findpath M00000000000000007e89e400000000" /dev/xvdc1 |cat but it said "usage: locate <inode#>" despite the man page stating otherwise. -R "locate ..." said the same.
>>>
>>> I hope you're able to get some useful info from the above. If not, can you please provide the next steps that you would want me to run *in case* it happens again.
>>>
>>> Cheers,
>>>
>>> Brad
>>>
>>>
>>> On Thu, 18 Mar 2010 11:25:28 -0700
>>> Sunil Mushran <sunil.mushran at oracle.com> wrote:
>>>
>>>   
>>>       
>>>> I am assuming you are mounting the nfs mounts with the nordirplus
>>>> mount option. If not, that is known to deadlock a nfsd thread leading
>>>> to what you are seeing.
>>>>
>>>> There are two possible reasons for this error. One is a dlm issue.
>>>> Other is a local deadlock like above.
>>>>
>>>> To see if the dlm is the cause for the hang, run scanlocks2.
>>>> http://oss.oracle.com/~smushran/.dlm/scripts/scanlocks2
>>>>
>>>> This will dump the busy lock resources. Run it a few times. If
>>>> a lock resource comes up regularly, then it indicates a dlm problem.
>>>>
>>>> Then dump the fs and dlm lock state on that node.
>>>> debugfs.ocfs2 -R "fs_locks LOCKNAME" /dev/sdX
>>>> debugfs.ocfs2 -R "dlm_locks LOCKNAME" /dev/sdX
>>>>
>>>> The dlm lock will tell you the master node. Repeat the two dumps
>>>> on the master node. The dlm lock on the master node will point
>>>> to the current holder. Repeat the same on that node. Email all that
>>>> to me asap.
>>>>
>>>> michael.a.jaquays at verizon.com wrote:
>>>>     
>>>>         
>>>>> All,
>>>>>
>>>>> I've seen a few posts about this issue in the past, but not a resolution.  I have a 3 node cluster sharing ocfs2 volumes to app nodes via nfs.  On occasion, one of our db nodes will have nfs go into an uninterruptable sleep state.  The nfs daemon is completely useless at this point.  The db node has to be rebooted to resolve.  It seems that nfs is waiting on ocfs2_wait_for_mask.  Any suggestions on a resolution would be appreciated.
>>>>>
>>>>> root     18387  0.0  0.0      0     0 ?        S<   Mar15   0:00 [nfsd4]
>>>>> root     18389  0.0  0.0      0     0 ?        D    Mar15   0:10 [nfsd]
>>>>> root     18390  0.0  0.0      0     0 ?        D    Mar15   0:10 [nfsd]
>>>>> root     18391  0.0  0.0      0     0 ?        D    Mar15   0:10 [nfsd]
>>>>> root     18392  0.0  0.0      0     0 ?        D    Mar15   0:13 [nfsd]
>>>>> root     18393  0.0  0.0      0     0 ?        D    Mar15   0:08 [nfsd]
>>>>> root     18394  0.0  0.0      0     0 ?        D    Mar15   0:09 [nfsd]
>>>>> root     18395  0.0  0.0      0     0 ?        D    Mar15   0:12 [nfsd]
>>>>> root     18396  0.0  0.0      0     0 ?        D    Mar15   0:13 [nfsd] 
>>>>>
>>>>> 18387 nfsd4           worker_thread
>>>>> 18389 nfsd            ocfs2_wait_for_mask
>>>>> 18390 nfsd            ocfs2_wait_for_mask
>>>>> 18391 nfsd            ocfs2_wait_for_mask
>>>>> 18392 nfsd            ocfs2_wait_for_mask
>>>>> 18393 nfsd            ocfs2_wait_for_mask
>>>>> 18394 nfsd            ocfs2_wait_for_mask
>>>>> 18395 nfsd            ocfs2_wait_for_mask
>>>>> 18396 nfsd            ocfs2_wait_for_mask
>>>>>  
>>>>>
>>>>> -Mike Jaquays
>>>>> _______________________________________________
>>>>> Ocfs2-users mailing list
>>>>> Ocfs2-users at oss.oracle.com
>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>>   
>>>>>       
>>>>>           
>>>> _______________________________________________
>>>> Ocfs2-users mailing list
>>>> Ocfs2-users at oss.oracle.com
>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>     
>>>>         
>
>