[Ocfs2-devel] 40TB RAID and OCFS2 woes (inode64, JBD2, huge partition support, Volume might try to write to blocks beyond what jbd can address in 32 bits)

Robert Smith spamfree at wansecurity.com
Thu Dec 31 22:12:18 PST 2009


I started a new kernel compile before I went to bed, and installed it this morning.

To my surprise, when the system booted up, it mounted the partition.


root at s2-replay02:~# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/s2--replay02-root
                      1.9T   11G  1.8T   1% /
udev                  1.9G  204K  1.9G   1% /dev
none                  1.9G     0  1.9G   0% /dev/shm
none                  1.9G   36K  1.9G   1% /var/run
none                  1.9G     0  1.9G   0% /var/lock
none                  1.9G     0  1.9G   0% /lib/init/rw
/dev/sda5             228M  165M   51M  77% /boot
/dev/mapper/replays-ReplayDataVolume001
                       37T  1.3G   37T   1% /data/storage/ReplayDataVolume001
root at s2-replay02:~# mounted.ocfs2 
usage: mounted.ocfs2 [-d] [-f] [device]
        -d quick detect
        -f full detect
root at s2-replay02:~# mounted.ocfs2 -f /dev/replays/ReplayDataVolume001 
Device                FS     Nodes
/dev/replays/ReplayDataVolume001  ocfs2  s2-replay02
root at s2-replay02:~#


1.3G for FS overhead. Partially due to the way I formatted I'm sure.

ie. mkfs.ocfs2 -L "ReplayDataVolume001" -C 1M -N 2 -J block64 -F -v -T datafiles -M cluster --fs-feature-level=max-features /dev/replays/ReplayDataVolume001


And for some quick performance checks:

root at s2-replay02:~# time dd bs=100M count=10 if=/dev/zero of=/data/storage/ReplayDataVolume001/big_file
10+0 records in
10+0 records out
1048576000 bytes (1.0 GB) copied, 1.00014 s, 1.0 GB/s

real    0m1.008s
user    0m0.000s
sys     0m0.920s
root at s2-replay02:~# time dd bs=100M count=100 if=/dev/zero of=/data/storage/ReplayDataVolume001/bigger_file
100+0 records in
100+0 records out
10485760000 bytes (10 GB) copied, 17.1197 s, 612 MB/s

real    0m17.796s
user    0m0.000s
sys     0m10.350s
root at s2-replay02:~# time dd bs=100M count=1000 if=/dev/zero of=/data/storage/ReplayDataVolume001/biggest_file
1000+0 records in
1000+0 records out
104857600000 bytes (105 GB) copied, 176.464 s, 594 MB/s

real    2m57.357s
user    0m0.000s
sys     1m50.520s
root at s2-replay02:~# time dd bs=1000M count=1000 if=/dev/zero of=/data/storage/ReplayDataVolume001/biggest_yet_file
1000+0 records in               
1000+0 records out
1048576000000 bytes (1.0 TB) copied, 1825.52 s, 574 MB/s

real    30m26.573s
user    0m0.000s
sys     18m59.110s
root at s2-replay02:~#


Every 1.0s: ls -aFl /data/storage/ReplayDataVolume001/*                           Fri Jan  1 00:03:50 2010

-rw-r--r-- 1 root root    1048576000 2009-12-31 23:23 /data/storage/ReplayDataVolume001/big_file
-rw-r--r-- 1 root root   10485760000 2009-12-31 23:24 /data/storage/ReplayDataVolume001/bigger_file
-rw-r--r-- 1 root root  104857600000 2009-12-31 23:28 /data/storage/ReplayDataVolume001/biggest_file
-rw-r--r-- 1 root root 1048576000000 2010-01-01 00:01 /data/storage/ReplayDataVolume001/biggest_yet_file

/data/storage/ReplayDataVolume001/lost+found:
total 0
drwxr-xr-x 2 root root 3896 2009-12-31 11:53 ./
drwxr-xr-x 3 root root 3896 2009-12-31 23:31 ../


I guess I'll try a 20 terabyte file now, and seek past 16TB ?

Any other tests I can run to see how she's gonna hold up? What can I do to try to break it?

-Robert



On Jan 1, 2010, at 5:08 AM, Joel Becker wrote:

> On Fri, Jan 01, 2010 at 04:36:02AM +0900, Robert Smith wrote:
>> Oh, I found it at line #2163 of fs/ocfs2/super.c.
>> 
>> I imagine that something as simple as the following would work, but perhaps I'll wait for your feedback.
>> 
>> 
>> /*
>>        if (ocfs2_clusters_to_blocks(osb->sb, le32_to_cpu(di->i_clusters) - 1)
>>> (u32)~0UL) {
>>                mlog(ML_ERROR, "Volume might try to write to blocks beyond "
>>                     "what jbd can address in 32 bits.\n");
>>                status = -EINVAL;
>>                goto bail;
>>        }
>> */
> 
> 	That should work.  The real solution will check based on the
> journal flags.  Be warned, there be tygers in here.
> 
> Joel
> 
> -- 
> 
> "But all my words come back to me
> In shades of mediocrity.
> Like emptiness in harmony
> I need someone to comfort me."
> 
> Joel Becker
> Principal Software Developer
> Oracle
> E-mail: joel.becker at oracle.com
> Phone: (650) 506-8127




More information about the Ocfs2-devel mailing list