[Ocfs2-devel] Read IOPS storm in case of reflinking running VM disk

Eugene Istomin E.Istomin at edss.ee
Wed May 20 15:33:43 PDT 2015


Goldwyn,

thanks for the answer!

I read 
https://oss.oracle.com/osswiki/OCFS2(2f)DesignDocs(2f)RefcountTrees.html  
carefully to understand the problem.

As i understand:
There are B-Tree structures for reflink: ocfs2_refcount_tree; 
ocfs2_refcount_block -> ocfs2_refcount_list -> ocfs2_refcount_rec
"The refcount tree root is a refcount block pointed to by i_refcount_loc"
Some operations needs extra uncached lookups
Also i dumped frag/stat/refcount from production hypervisor node using 
debugfs.ocfs2, files are in attach (url as alt way - 
http://public.edss.ee/tmp/debugfs.tar.gz ). 

Hypervisor OCFS2 mount options: 
rw,nosuid,noexec,noatime,heartbeat=none,nointr,data=ordered,errors=remount-
ro,localalloc=2048,coherency=full,user_xattr,acl

Mkfs string:
mkfs.ocfs2 -b 4KB -C 1MB -N 2 -T vmstore -L "storage" --fs-
features=local,backup-super,sparse,unwritten,inline-
data,metaecc,refcount,xattr,indexed-dirs,discontig-bg


Can you please explain why there are so many extent blocks (204)? Is it really 
impossible to store plenty of clusters in single extent (like #25, block 
3874095 -> 20847 clusters)? 

-- 
Best regards,
Eugene Istomin
IT Architect

On Monday, May 18, 2015 12:45:40 PM Goldwyn Rodrigues wrote:
> Hi Eugene,
> 
> Sorry, had been busy with other work and this slipped on the list.
> 
> >  > Do you know something about such behavior?
> >  > 
> >  > The question is why a reflink operation on VM disk leads to plenty of
> > 
> > read
> > 
> >  > ops? Is this related to CoW specific structures?
> 
> This is in fact related to the CoW. An ocfs2 file is an extent tree,
> which the extent headers marking if the extent is a reflinked or not
> with the number of reflinks.
> 
> If you perform a reflink on a file which is being changed constantly,
> not only recreate the extent tree, but also decrease the refcount of the
> ones already present. Add to it, the extents which need to be read for
> replication.
> 
> 
> HTH,
> 
> >  > We can provide others details & ssh to testbed.
> >  > 
> >  > > Hello,
> >  > > 
> >  > > 
> >  > > 
> >  > > after deploying reflink-based VM snapshots to production servers we
> >  > > 
> >  > > discovered a performace degradation:
> >  > > 
> >  > > 
> >  > > 
> >  > > OS: Opensuse 13.1, 13.2
> >  > > 
> >  > > Hypervisors: Xen 4.4, 4.5
> >  > > 
> >  > > Dom0 kernels: 3.12, 3.16, 3.18
> >  > > 
> >  > > DomU kernels: 3.12, 3.16, 3.18
> >  > > 
> >  > > Tested DomU disk backends: tapdisk2, qdisk
> >  > > 
> >  > > 
> >  > > 
> >  > > 
> >  > > 
> >  > > 1) on DomU (VM)
> >  > > 
> >  > > #dd if=/dev/zero of=test2 bs=1M count=6000
> >  > > 
> >  > > 
> >  > > 
> >  > > 2) atop on Dom0:
> >  > > 
> >  > > sdb - busy:92% - read:375 - write:130902
> >  > > 
> >  > > Reads are from others VMs, seems OK
> >  > > 
> >  > > 
> >  > > 
> >  > > 3) DomU dd finished:
> >  > > 
> >  > > 6291456000 bytes (6.3 GB) copied, 16.6265 s, 378 MB/s
> >  > > 
> >  > > 
> >  > > 
> >  > > 4) Lets start dd again & do a snapshot:
> >  > > 
> >  > > #dd if=/dev/zero of=test2 bs=1M count=6000
> >  > > 
> >  > > #reflink test.raw ref/
> >  > > 
> >  > > 
> >  > > 
> >  > > 5) atop on Dom0:
> >  > > 
> >  > > sdb - busy:97% - read:112740 - write:28037
> >  > > 
> >  > > So, Read IOPS = 112740, why?
> >  > > 
> >  > > 
> >  > > 
> >  > > 6) DomU dd finished:
> >  > > 
> >  > > 6291456000 bytes (6.3 GB) copied, 175.45 s, 35.9 MB/s
> >  > > 
> >  > > 
> >  > > 
> >  > > 7) Second & further reflinks do not change the atop stat & dd time
> >  > > 
> >  > > #dd if=/dev/zero of=test2 bs=1M count=6000
> >  > > 
> >  > > #reflink --backup=t test.raw ref/ \\ * n times
> >  > > 
> >  > > ~ 6291456000 bytes (6.3 GB) copied, 162.959 s, 38.6 MB/s
> >  > > 
> >  > > 
> >  > > 
> >  > > The question is why reflinking a running VM disk leads to read IOPS
> > 
> > storm?
> > 
> >  > > Thanks!
> >  > 
> >  > _______________________________________________
> >  > 
> >  > Ocfs2-devel mailing list
> >  > 
> >  > Ocfs2-devel at oss.oracle.com
> >  > 
> >  > https://oss.oracle.com/mailman/listinfo/ocfs2-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20150521/23cd43e2/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: debugfs.tar.gz
Type: application/x-compressed-tar
Size: 729820 bytes
Desc: not available
Url : http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20150521/23cd43e2/attachment-0001.bin 


More information about the Ocfs2-devel mailing list