[Ocfs2-users] ocfs2 - Kernel panic on many write/read from both

Marek Królikowski admin at wset.edu.pl
Wed Dec 21 12:30:47 PST 2011


Hello
Thank You for answer i doing now:
MAIL-TEST1 ~ # mkfs.ocfs2 -N2 -L MAIL-TEST --fs-feature-level=max-features 
/dev/dm-0
mkfs.ocfs2 1.6.4
Cluster stack: classic o2cb
Overwriting existing ocfs2 partition.
Proceed (y/N): Y
Label: MAIL-TEST
Features: sparse extended-slotmap backup-super unwritten inline-data 
strict-journal-super metaecc xattr indexed-dirs usrquota grpquota refcount 
discontig-bg
Block size: 4096 (12 bits)
Cluster size: 4096 (12 bits)
Volume size: 1729073381376 (422137056 clusters) (422137056 blocks)
Cluster groups: 13088 (tail covers 2784 clusters, rest cover 32256 clusters)
Extent allocator size: 868220928 (207 groups)
Journal size: 268435456
Node slots: 2
Creating bitmaps: done
Initializing superblock: done
Writing system files: done
Writing superblock: done
Writing backup superblock: 6 block(s)
Formatting Journals: done
Growing extent allocator: done
Formatting slot map: done
Formatting quota files: done
Writing lost+found: done
mkfs.ocfs2 successful

MAIL-TEST1 ~ # mount /dev/dm-0 -o usrquota /mnt/EMC
MAIL-TEST1 ~ # cat /proc/mounts
[cut]
/dev/dm-0 /mnt/EMC ocfs2 
rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,usrquota,coherency=full,user_xattr,acl 
0 0

MAIL-TEST2 ~ # mount /dev/dm-0 -o usrquota /mnt/EMC
MAIL-TEST2 ~ # cat /proc/mounts
[cut]
/dev/dm-0 /mnt/EMC ocfs2 
rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,usrquota,coherency=full,user_xattr,acl 
0 0

And on both run my script to test OCFS2.


-----Oryginalna wiadomość----- 
From: Srinivas Eeda
Sent: Wednesday, December 21, 2011 8:43 PM
To: Marek Królikowski
Cc: ocfs2-users at oss.oracle.com
Subject: Re: [Ocfs2-users] ocfs2 - Kernel panic on many write/read from both

Marek Królikowski wrote:
> Hello
> After 24 hours testing without quota and without all features kernel don`t 
> give me oops but when i use debugfs i still is see:
> TEST-MAIL1 ~ # echo "ls //orphan_dir:0001"|debugfs.ocfs2 /dev/dm-0|wc
> debugfs.ocfs2 1.6.4
>    154     764    7163
> TEST-MAIL1 ~ # echo "ls //orphan_dir:0000"|debugfs.ocfs2 /dev/dm-0|wc
> debugfs.ocfs2 1.6.4
>    531    2649   24882
Those numbers look good. Basically with the fixes backed out and another
fix I gave, you are not seeing that many orphans hanging around and
hence not seeing the process stuck kernel stacks. You can run the test
longer or if you are satisfied, please enable quotas and re-run the test
with the modified kernel. You might see a dead lock which needs to be
fixed(I was not able to reproduce this yet). If the system hangs, please
capture the following and provide me the output

1. echo t > /proc/sysrq-trigger
2. debugfs.ocfs2 -l ENTRY EXIT DLM_GLUE QUOTA INODE DISK_ALLOC
EXTENT_MAP allow
3. wait for 10 minutes
4. debugfs.ocfs2 -l ENTRY EXIT DLM_GLUE QUOTA INODE DISK_ALLOC
EXTENT_MAP off
5. echo t > /proc/sysrq-trigger

Thanks,
--Srini
>
> Thanks
> -----Oryginalna wiadomość----- From: Srinivas Eeda
> Sent: Tuesday, December 20, 2011 8:50 PM
> To: Marek Królikowski
> Cc: ocfs2-users at oss.oracle.com
> Subject: Re: [Ocfs2-users] ocfs2 - Kernel panic on many write/read from 
> both
>
> The link prompts for username/passwd.
>
> On top of the changes you made, please add the following patch
>
> diff -uNrp linux-2.6.32.x86_64.orig/fs/ocfs2/dlmglue.c 
> linux-2.6.32.x86_64/fs/ocfs2/dlmglue.c
> --- linux-2.6.32.x86_64.orig/fs/ocfs2/dlmglue.c 2011-11-28 
> 21:51:21.000000000 -0800
> +++ linux-2.6.32.x86_64/fs/ocfs2/dlmglue.c 2011-11-28 
> 22:04:55.000000000 -0800
> @@ -3808,6 +3808,8 @@ static int ocfs2_dentry_convert_worker(s
>  * for a downconvert.
>  */
>  d_delete(dentry);
> + if (dentry)
> + d_drop(dentry);
>  dput(dentry);
>
>  spin_lock(&dentry_attach_lock);
>
>
> The patches that I mentioned earlier are made to address a deadlock when
> quotas are enabled but I am not sure what the deadlock was and if you
> are willing to help, I would suggest the following plan.
>
> 1. Disable quotas, revert the patches that I pointed earlier and also
> add the above patch and run your test case. You shouldn't see any more
> orphans. To  verify (please run the echo command I mentioned)
>
> 2. If you are not seeing any more orphans problem 1 is solved, now
> enable quotas and run the tests. If you see any deadlock, run the
> following on all nodes and provide us the messages files.
>  a) echo t > /proc/sysrq-trigger from all nodes
>
> Thanks,
> --Srini
>
> Marek Królikowski wrote:
>> Hello
>> Thank You for answer.
>> The most problem i need quota because that will be a /home directory for 
>> my maildir users.
>> And few days ago like i say i contact with Sunil Mushran  and he tell me 
>> to remove this patches and i do this but don`t help me - take a look:
>> https://wizja2.tktelekom.pl/ocfs2/
>> Thanks
>>
>> -----Oryginalna wiadomość----- From: Srinivas Eeda
>> Sent: Tuesday, December 20, 2011 7:58 PM
>> To: Marek Królikowski
>> Cc: ocfs2-users at oss.oracle.com
>> Subject: Re: [Ocfs2-users] ocfs2 - Kernel panic on many write/read from 
>> both
>>
>> Marek Królikowski wrote:
>>> Sorry i don`t copy everything:
>>> TEST-MAIL1# echo "ls //orphan_dir:0000"|debugfs.ocfs2 /dev/dm-0|wc
>>> debugfs.ocfs2 1.6.4
>>> 5239722 26198604 246266859
>> ^^^^^ those numbers (5239722, 6074335) are the problem. What they are
>> telling is the orphan directory is filled with flood of files. This is
>> because of the change of unlink behavior introduced by patch
>> "ea455f8ab68338ba69f5d3362b342c115bea8e13".
>>
>> If you are interested in details, ... in normal unlink case an entry for
>> the deleting file is created in orphan directory as an intermediate step
>> and the entry is cleared towards the end of the unlink process. But
>> because of that patch, entry doesn't get cleared and sticks around.
>>
>> OCFS2 has a function called orphan scan which is executed as part of a
>> thread which gets a ex lock on orphan scan lock and it then scans to
>> clear all entries but it can't because the open lock is still around.
>> Since this can takes longer because of the huge number of entries
>> getting created, *new deletes will get delayed* as they need the ex lock.
>>
>> So what can be done? for now if you are not using quotas feature you
>> should get a new kernel by backing out the following patches
>>
>> 5fd131893793567c361ae64cbeb28a2a753bbe35
>> f7b1aa69be138ad9d7d3f31fa56f4c9407f56b6a
>> ea455f8ab68338ba69f5d3362b342c115bea8e13
>>
>> or periodically umount the file system on all nodes and remount whenever
>> the problem becomes severe.
>>
>> Thanks,
>> --Srini
>>
>>> TEST-MAIL1# echo "ls //orphan_dir:0001"|debugfs.ocfs2 /dev/dm-0|wc
>>> debugfs.ocfs2 1.6.4
>>> 6074335 30371669 285493670
>>>  TEST-MAIL2 ~ # echo "ls //orphan_dir:0000"|debugfs.ocfs2 /dev/dm-0|wc
>>> debugfs.ocfs2 1.6.4
>>> 5239722 26198604 246266859
>>> TEST-MAIL2 ~ # echo "ls //orphan_dir:0001"|debugfs.ocfs2 /dev/dm-0|wc
>>> debugfs.ocfs2 1.6.4
>>> 6074335 30371669 285493670
>>>  Thanks for Your help.
>>>  *From:* Marek Królikowski <mailto:admin at wset.edu.pl>
>>> *Sent:* Tuesday, December 20, 2011 6:39 PM
>>> *To:* ocfs2-users at oss.oracle.com <mailto:ocfs2-users at oss.oracle.com>
>>> *Subject:* Re: [Ocfs2-users] ocfs2 - Kernel panic on many write/read 
>>> from both
>>>
>>> > I think you are running into a known issue. Are there lot of orphan
>>> > files in orphan directory? I am not sure if the problem is still
>>> there,
>>> > if not please run the same test and once you see the same symptoms,
>>> > please run the following and provide me the output
>>> >
>>> > echo "ls //orphan_dir:0000"|debugfs.ocfs2 <device>|wc
>>> > echo "ls //orphan_dir:0001"|debugfs.ocfs2 <device>|wc
>>> Hello
>>> Thank You for answer - strange i don`t get email with Your answer.
>>> This is what You want:
>>> TEST-MAIL1# echo "ls //orphan_dir:0000"|debugfs.ocfs2 /dev/dm-0|wc
>>> debugfs.ocfs2 1.6.4
>>> 5239722 26198604 246266859
>>>  TEST-MAIL2 ~ # echo "ls //orphan_dir:0000"|debugfs.ocfs2 /dev/dm-0|wc
>>> debugfs.ocfs2 1.6.4
>>> 5239722 26198604 246266859
>>>  This is my testing cluster so if u need do more tests please tell me i 
>>> do for You.
>>> Thanks
>>> ------------------------------------------------------------------------ 
>>>
>>>
>>> _______________________________________________
>>> Ocfs2-users mailing list
>>> Ocfs2-users at oss.oracle.com
>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>
> 




More information about the Ocfs2-users mailing list