[Ocfs2-users] A Billion Files on OCFS2 -- Best Practices?

Wed Feb 1 10:24:05 PST 2012

Here's what I got from debugfs.ocfs2 -R "stats".  I have to type it out
manually, so I'm only including the "features" lines:

Feature Compat: 3 backup-super strict-journal-super
Feature Incompat: 16208 sparse extended-slotmap inline-data metaecc xattr
indexed-dirs refcount discontig-bg
Feature RO compat: 7 unwritten usrquota grpquota

Some other info that may be interesting:

Links: 0   Clusters: 52428544

On Wed, Feb 1, 2012 at 1:04 PM, Sunil Mushran <sunil.mushran at oracle.com>wrote:

> debugfs.ocfs2 -R "stats" /dev/mapper/...
> I want to see the features enabled.
>
> The main issue with large metdata is the fsck timing. The recently tagged
> 1.8 release of the tools has much better fsck performance.
>
>
> On 02/01/2012 05:25 AM, Mark Hampton wrote:
>
>> We have an application that has many processing threads writing more
>> than a billion files ranging from 2KB – 50KB, with 50% under
>> 8KB (currently there are 700 million files).  The files are never
>> deleted or modified – they are written once, and read infrequently.  The
>> files are hashed so that they are evenly distributed across ~1,000,000
>> subdirectories up to 3 levels deep, with up to 1000 files per
>> directory.  The directories are structured like this:
>>
>> 0/00/00
>>
>> 0/00/01
>>
>> …
>>
>> F/FF/FE
>>
>> F/FF/FF
>>
>> The files need to be readable and writable across a number of
>> servers. The NetApp filer we purchased for this project has both NFS and
>> iSCSI capabilities.
>>
>> We first tried doing this via NFS.  After writing 700 million files (12
>> TB) into a single NetApp volume, file-write performance became abysmally
>> slow.  We can't create more than 200 files per second on the NetApp
>> volume, which is about 20% of our required performance target of 1000
>> files per second.  It appears that most of the file-write time is going
>> towards stat and inode-create operations.
>>
>> So I now I’m trying the same thing with OCFS2 over iSCSI.  I created 16
>> luns on the NetApp.  The 16 luns became 16 OCFS2 filesystems with 16
>> different mount points on our servers.
>>
>> With this configuration I was initially able to write ~1800 files per
>> second.  Now that I have completed 100 million files, performance has
>> dropped to ~1500 files per second.
>>
>> I’m using OEL 6.1 (2.6.32-100 kernel) with OCFS2 version 1.6.  The
>> application servers have 128GB of memory.  I created my OCFS2
>> filesystems as follows:
>>
>> mkfs.ocfs2 –T mail –b 4k –C 4k –L <my label> --fs-features=indexed-dirs
>> –fs-feature-level=max-features /dev/mapper/<my device>
>>
>> And I mount them with these options:
>>
>> _netdev,commit=30,noatime,**localflocks,localalloc=32
>>
>> So my questions are these:
>>
>>
>> 1) Given a billion files sized 2KB – 50KB, with 50% under 8KB, do I have
>> the optimal OCFS2 filesystem and mount-point configurations?
>>
>>
>> 2) Should I split the files across even more filesystems?  Currently I
>> have them split across 16 OCFS2 filesystems.
>>
>> Thanks a billion!
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20120201/593128f1/attachment-0001.html