[Ocfs2-users] ocfs2 - Kernel panic on many write/read from both

Srinivas Eeda srinivas.eeda at oracle.com
Tue Dec 20 11:50:47 PST 2011


The link prompts for username/passwd.

On top of the changes you made, please add the following patch

diff -uNrp linux-2.6.32.x86_64.orig/fs/ocfs2/dlmglue.c linux-2.6.32.x86_64/fs/ocfs2/dlmglue.c
--- linux-2.6.32.x86_64.orig/fs/ocfs2/dlmglue.c	2011-11-28 21:51:21.000000000 -0800
+++ linux-2.6.32.x86_64/fs/ocfs2/dlmglue.c	2011-11-28 22:04:55.000000000 -0800
@@ -3808,6 +3808,8 @@ static int ocfs2_dentry_convert_worker(s
 		 * for a downconvert.
 		 */
 		d_delete(dentry);
+		if (dentry)
+			d_drop(dentry);
 		dput(dentry);
 
 		spin_lock(&dentry_attach_lock);


The patches that I mentioned earlier are made to address a deadlock when 
quotas are enabled but I am not sure what the deadlock was and if you 
are willing to help, I would suggest the following plan.

1. Disable quotas, revert the patches that I pointed earlier and also 
add the above patch and run your test case. You shouldn't see any more 
orphans. To  verify (please run the echo command I mentioned)

2. If you are not seeing any more orphans problem 1 is solved, now 
enable quotas and run the tests. If you see any deadlock, run the 
following on all nodes and provide us the messages files.
  a) echo t > /proc/sysrq-trigger from all nodes

Thanks,
--Srini

Marek Królikowski wrote:
> Hello
> Thank You for answer.
> The most problem i need quota because that will be a /home directory 
> for my maildir users.
> And few days ago like i say i contact with Sunil Mushran  and he tell 
> me to remove this patches and i do this but don`t help me - take a look:
> https://wizja2.tktelekom.pl/ocfs2/
> Thanks
>
> -----Oryginalna wiadomość----- From: Srinivas Eeda
> Sent: Tuesday, December 20, 2011 7:58 PM
> To: Marek Królikowski
> Cc: ocfs2-users at oss.oracle.com
> Subject: Re: [Ocfs2-users] ocfs2 - Kernel panic on many write/read 
> from both
>
> Marek Królikowski wrote:
>> Sorry i don`t copy everything:
>> TEST-MAIL1# echo "ls //orphan_dir:0000"|debugfs.ocfs2 /dev/dm-0|wc
>> debugfs.ocfs2 1.6.4
>> 5239722 26198604 246266859
> ^^^^^ those numbers (5239722, 6074335) are the problem. What they are
> telling is the orphan directory is filled with flood of files. This is
> because of the change of unlink behavior introduced by patch
> "ea455f8ab68338ba69f5d3362b342c115bea8e13".
>
> If you are interested in details, ... in normal unlink case an entry for
> the deleting file is created in orphan directory as an intermediate step
> and the entry is cleared towards the end of the unlink process. But
> because of that patch, entry doesn't get cleared and sticks around.
>
> OCFS2 has a function called orphan scan which is executed as part of a
> thread which gets a ex lock on orphan scan lock and it then scans to
> clear all entries but it can't because the open lock is still around.
> Since this can takes longer because of the huge number of entries
> getting created, *new deletes will get delayed* as they need the ex lock.
>
> So what can be done? for now if you are not using quotas feature you
> should get a new kernel by backing out the following patches
>
> 5fd131893793567c361ae64cbeb28a2a753bbe35
> f7b1aa69be138ad9d7d3f31fa56f4c9407f56b6a
> ea455f8ab68338ba69f5d3362b342c115bea8e13
>
> or periodically umount the file system on all nodes and remount whenever
> the problem becomes severe.
>
> Thanks,
> --Srini
>
>> TEST-MAIL1# echo "ls //orphan_dir:0001"|debugfs.ocfs2 /dev/dm-0|wc
>> debugfs.ocfs2 1.6.4
>> 6074335 30371669 285493670
>>  TEST-MAIL2 ~ # echo "ls //orphan_dir:0000"|debugfs.ocfs2 /dev/dm-0|wc
>> debugfs.ocfs2 1.6.4
>> 5239722 26198604 246266859
>> TEST-MAIL2 ~ # echo "ls //orphan_dir:0001"|debugfs.ocfs2 /dev/dm-0|wc
>> debugfs.ocfs2 1.6.4
>> 6074335 30371669 285493670
>>  Thanks for Your help.
>>  *From:* Marek Królikowski <mailto:admin at wset.edu.pl>
>> *Sent:* Tuesday, December 20, 2011 6:39 PM
>> *To:* ocfs2-users at oss.oracle.com <mailto:ocfs2-users at oss.oracle.com>
>> *Subject:* Re: [Ocfs2-users] ocfs2 - Kernel panic on many write/read 
>> from both
>>
>> > I think you are running into a known issue. Are there lot of orphan
>> > files in orphan directory? I am not sure if the problem is still 
>> there,
>> > if not please run the same test and once you see the same symptoms,
>> > please run the following and provide me the output
>> >
>> > echo "ls //orphan_dir:0000"|debugfs.ocfs2 <device>|wc
>> > echo "ls //orphan_dir:0001"|debugfs.ocfs2 <device>|wc
>> Hello
>> Thank You for answer - strange i don`t get email with Your answer.
>> This is what You want:
>> TEST-MAIL1# echo "ls //orphan_dir:0000"|debugfs.ocfs2 /dev/dm-0|wc
>> debugfs.ocfs2 1.6.4
>> 5239722 26198604 246266859
>>  TEST-MAIL2 ~ # echo "ls //orphan_dir:0000"|debugfs.ocfs2 /dev/dm-0|wc
>> debugfs.ocfs2 1.6.4
>> 5239722 26198604 246266859
>>  This is my testing cluster so if u need do more tests please tell me 
>> i do for You.
>> Thanks
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users 
>




More information about the Ocfs2-users mailing list