[Ocfs2-users] ocfs2 - Kernel panic on many write/read from both

Srinivas Eeda srinivas.eeda at oracle.com
Wed Dec 21 11:43:50 PST 2011


Marek Królikowski wrote:
> Hello
> After 24 hours testing without quota and without all features kernel 
> don`t give me oops but when i use debugfs i still is see:
> TEST-MAIL1 ~ # echo "ls //orphan_dir:0001"|debugfs.ocfs2 /dev/dm-0|wc
> debugfs.ocfs2 1.6.4
>    154     764    7163
> TEST-MAIL1 ~ # echo "ls //orphan_dir:0000"|debugfs.ocfs2 /dev/dm-0|wc
> debugfs.ocfs2 1.6.4
>    531    2649   24882
Those numbers look good. Basically with the fixes backed out and another 
fix I gave, you are not seeing that many orphans hanging around and 
hence not seeing the process stuck kernel stacks. You can run the test 
longer or if you are satisfied, please enable quotas and re-run the test 
with the modified kernel. You might see a dead lock which needs to be 
fixed(I was not able to reproduce this yet). If the system hangs, please 
capture the following and provide me the output

1. echo t > /proc/sysrq-trigger
2. debugfs.ocfs2 -l ENTRY EXIT DLM_GLUE QUOTA INODE DISK_ALLOC 
EXTENT_MAP allow
3. wait for 10 minutes
4. debugfs.ocfs2 -l ENTRY EXIT DLM_GLUE QUOTA INODE DISK_ALLOC 
EXTENT_MAP off
5. echo t > /proc/sysrq-trigger

Thanks,
--Srini
>
> Thanks
> -----Oryginalna wiadomość----- From: Srinivas Eeda
> Sent: Tuesday, December 20, 2011 8:50 PM
> To: Marek Królikowski
> Cc: ocfs2-users at oss.oracle.com
> Subject: Re: [Ocfs2-users] ocfs2 - Kernel panic on many write/read 
> from both
>
> The link prompts for username/passwd.
>
> On top of the changes you made, please add the following patch
>
> diff -uNrp linux-2.6.32.x86_64.orig/fs/ocfs2/dlmglue.c 
> linux-2.6.32.x86_64/fs/ocfs2/dlmglue.c
> --- linux-2.6.32.x86_64.orig/fs/ocfs2/dlmglue.c 2011-11-28 
> 21:51:21.000000000 -0800
> +++ linux-2.6.32.x86_64/fs/ocfs2/dlmglue.c 2011-11-28 
> 22:04:55.000000000 -0800
> @@ -3808,6 +3808,8 @@ static int ocfs2_dentry_convert_worker(s
>  * for a downconvert.
>  */
>  d_delete(dentry);
> + if (dentry)
> + d_drop(dentry);
>  dput(dentry);
>
>  spin_lock(&dentry_attach_lock);
>
>
> The patches that I mentioned earlier are made to address a deadlock when
> quotas are enabled but I am not sure what the deadlock was and if you
> are willing to help, I would suggest the following plan.
>
> 1. Disable quotas, revert the patches that I pointed earlier and also
> add the above patch and run your test case. You shouldn't see any more
> orphans. To  verify (please run the echo command I mentioned)
>
> 2. If you are not seeing any more orphans problem 1 is solved, now
> enable quotas and run the tests. If you see any deadlock, run the
> following on all nodes and provide us the messages files.
>  a) echo t > /proc/sysrq-trigger from all nodes
>
> Thanks,
> --Srini
>
> Marek Królikowski wrote:
>> Hello
>> Thank You for answer.
>> The most problem i need quota because that will be a /home directory 
>> for my maildir users.
>> And few days ago like i say i contact with Sunil Mushran  and he tell 
>> me to remove this patches and i do this but don`t help me - take a look:
>> https://wizja2.tktelekom.pl/ocfs2/
>> Thanks
>>
>> -----Oryginalna wiadomość----- From: Srinivas Eeda
>> Sent: Tuesday, December 20, 2011 7:58 PM
>> To: Marek Królikowski
>> Cc: ocfs2-users at oss.oracle.com
>> Subject: Re: [Ocfs2-users] ocfs2 - Kernel panic on many write/read 
>> from both
>>
>> Marek Królikowski wrote:
>>> Sorry i don`t copy everything:
>>> TEST-MAIL1# echo "ls //orphan_dir:0000"|debugfs.ocfs2 /dev/dm-0|wc
>>> debugfs.ocfs2 1.6.4
>>> 5239722 26198604 246266859
>> ^^^^^ those numbers (5239722, 6074335) are the problem. What they are
>> telling is the orphan directory is filled with flood of files. This is
>> because of the change of unlink behavior introduced by patch
>> "ea455f8ab68338ba69f5d3362b342c115bea8e13".
>>
>> If you are interested in details, ... in normal unlink case an entry for
>> the deleting file is created in orphan directory as an intermediate step
>> and the entry is cleared towards the end of the unlink process. But
>> because of that patch, entry doesn't get cleared and sticks around.
>>
>> OCFS2 has a function called orphan scan which is executed as part of a
>> thread which gets a ex lock on orphan scan lock and it then scans to
>> clear all entries but it can't because the open lock is still around.
>> Since this can takes longer because of the huge number of entries
>> getting created, *new deletes will get delayed* as they need the ex 
>> lock.
>>
>> So what can be done? for now if you are not using quotas feature you
>> should get a new kernel by backing out the following patches
>>
>> 5fd131893793567c361ae64cbeb28a2a753bbe35
>> f7b1aa69be138ad9d7d3f31fa56f4c9407f56b6a
>> ea455f8ab68338ba69f5d3362b342c115bea8e13
>>
>> or periodically umount the file system on all nodes and remount whenever
>> the problem becomes severe.
>>
>> Thanks,
>> --Srini
>>
>>> TEST-MAIL1# echo "ls //orphan_dir:0001"|debugfs.ocfs2 /dev/dm-0|wc
>>> debugfs.ocfs2 1.6.4
>>> 6074335 30371669 285493670
>>>  TEST-MAIL2 ~ # echo "ls //orphan_dir:0000"|debugfs.ocfs2 /dev/dm-0|wc
>>> debugfs.ocfs2 1.6.4
>>> 5239722 26198604 246266859
>>> TEST-MAIL2 ~ # echo "ls //orphan_dir:0001"|debugfs.ocfs2 /dev/dm-0|wc
>>> debugfs.ocfs2 1.6.4
>>> 6074335 30371669 285493670
>>>  Thanks for Your help.
>>>  *From:* Marek Królikowski <mailto:admin at wset.edu.pl>
>>> *Sent:* Tuesday, December 20, 2011 6:39 PM
>>> *To:* ocfs2-users at oss.oracle.com <mailto:ocfs2-users at oss.oracle.com>
>>> *Subject:* Re: [Ocfs2-users] ocfs2 - Kernel panic on many write/read 
>>> from both
>>>
>>> > I think you are running into a known issue. Are there lot of orphan
>>> > files in orphan directory? I am not sure if the problem is still
>>> there,
>>> > if not please run the same test and once you see the same symptoms,
>>> > please run the following and provide me the output
>>> >
>>> > echo "ls //orphan_dir:0000"|debugfs.ocfs2 <device>|wc
>>> > echo "ls //orphan_dir:0001"|debugfs.ocfs2 <device>|wc
>>> Hello
>>> Thank You for answer - strange i don`t get email with Your answer.
>>> This is what You want:
>>> TEST-MAIL1# echo "ls //orphan_dir:0000"|debugfs.ocfs2 /dev/dm-0|wc
>>> debugfs.ocfs2 1.6.4
>>> 5239722 26198604 246266859
>>>  TEST-MAIL2 ~ # echo "ls //orphan_dir:0000"|debugfs.ocfs2 /dev/dm-0|wc
>>> debugfs.ocfs2 1.6.4
>>> 5239722 26198604 246266859
>>>  This is my testing cluster so if u need do more tests please tell 
>>> me i do for You.
>>> Thanks
>>> ------------------------------------------------------------------------ 
>>>
>>>
>>> _______________________________________________
>>> Ocfs2-users mailing list
>>> Ocfs2-users at oss.oracle.com
>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>
>




More information about the Ocfs2-users mailing list