[Ocfs2-devel] 40TB RAID and OCFS2 woes (inode64, JBD2, huge partition support, Volume might try to write to blocks beyond what jbd can address in 32 bits)
Robert Smith
spamfree at wansecurity.com
Thu Dec 31 22:12:18 PST 2009
I started a new kernel compile before I went to bed, and installed it this morning.
To my surprise, when the system booted up, it mounted the partition.
root at s2-replay02:~# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/s2--replay02-root
1.9T 11G 1.8T 1% /
udev 1.9G 204K 1.9G 1% /dev
none 1.9G 0 1.9G 0% /dev/shm
none 1.9G 36K 1.9G 1% /var/run
none 1.9G 0 1.9G 0% /var/lock
none 1.9G 0 1.9G 0% /lib/init/rw
/dev/sda5 228M 165M 51M 77% /boot
/dev/mapper/replays-ReplayDataVolume001
37T 1.3G 37T 1% /data/storage/ReplayDataVolume001
root at s2-replay02:~# mounted.ocfs2
usage: mounted.ocfs2 [-d] [-f] [device]
-d quick detect
-f full detect
root at s2-replay02:~# mounted.ocfs2 -f /dev/replays/ReplayDataVolume001
Device FS Nodes
/dev/replays/ReplayDataVolume001 ocfs2 s2-replay02
root at s2-replay02:~#
1.3G for FS overhead. Partially due to the way I formatted I'm sure.
ie. mkfs.ocfs2 -L "ReplayDataVolume001" -C 1M -N 2 -J block64 -F -v -T datafiles -M cluster --fs-feature-level=max-features /dev/replays/ReplayDataVolume001
And for some quick performance checks:
root at s2-replay02:~# time dd bs=100M count=10 if=/dev/zero of=/data/storage/ReplayDataVolume001/big_file
10+0 records in
10+0 records out
1048576000 bytes (1.0 GB) copied, 1.00014 s, 1.0 GB/s
real 0m1.008s
user 0m0.000s
sys 0m0.920s
root at s2-replay02:~# time dd bs=100M count=100 if=/dev/zero of=/data/storage/ReplayDataVolume001/bigger_file
100+0 records in
100+0 records out
10485760000 bytes (10 GB) copied, 17.1197 s, 612 MB/s
real 0m17.796s
user 0m0.000s
sys 0m10.350s
root at s2-replay02:~# time dd bs=100M count=1000 if=/dev/zero of=/data/storage/ReplayDataVolume001/biggest_file
1000+0 records in
1000+0 records out
104857600000 bytes (105 GB) copied, 176.464 s, 594 MB/s
real 2m57.357s
user 0m0.000s
sys 1m50.520s
root at s2-replay02:~# time dd bs=1000M count=1000 if=/dev/zero of=/data/storage/ReplayDataVolume001/biggest_yet_file
1000+0 records in
1000+0 records out
1048576000000 bytes (1.0 TB) copied, 1825.52 s, 574 MB/s
real 30m26.573s
user 0m0.000s
sys 18m59.110s
root at s2-replay02:~#
Every 1.0s: ls -aFl /data/storage/ReplayDataVolume001/* Fri Jan 1 00:03:50 2010
-rw-r--r-- 1 root root 1048576000 2009-12-31 23:23 /data/storage/ReplayDataVolume001/big_file
-rw-r--r-- 1 root root 10485760000 2009-12-31 23:24 /data/storage/ReplayDataVolume001/bigger_file
-rw-r--r-- 1 root root 104857600000 2009-12-31 23:28 /data/storage/ReplayDataVolume001/biggest_file
-rw-r--r-- 1 root root 1048576000000 2010-01-01 00:01 /data/storage/ReplayDataVolume001/biggest_yet_file
/data/storage/ReplayDataVolume001/lost+found:
total 0
drwxr-xr-x 2 root root 3896 2009-12-31 11:53 ./
drwxr-xr-x 3 root root 3896 2009-12-31 23:31 ../
I guess I'll try a 20 terabyte file now, and seek past 16TB ?
Any other tests I can run to see how she's gonna hold up? What can I do to try to break it?
-Robert
On Jan 1, 2010, at 5:08 AM, Joel Becker wrote:
> On Fri, Jan 01, 2010 at 04:36:02AM +0900, Robert Smith wrote:
>> Oh, I found it at line #2163 of fs/ocfs2/super.c.
>>
>> I imagine that something as simple as the following would work, but perhaps I'll wait for your feedback.
>>
>>
>> /*
>> if (ocfs2_clusters_to_blocks(osb->sb, le32_to_cpu(di->i_clusters) - 1)
>>> (u32)~0UL) {
>> mlog(ML_ERROR, "Volume might try to write to blocks beyond "
>> "what jbd can address in 32 bits.\n");
>> status = -EINVAL;
>> goto bail;
>> }
>> */
>
> That should work. The real solution will check based on the
> journal flags. Be warned, there be tygers in here.
>
> Joel
>
> --
>
> "But all my words come back to me
> In shades of mediocrity.
> Like emptiness in harmony
> I need someone to comfort me."
>
> Joel Becker
> Principal Software Developer
> Oracle
> E-mail: joel.becker at oracle.com
> Phone: (650) 506-8127
More information about the Ocfs2-devel
mailing list