[Ocfs2-devel] Read IOPS storm in case of reflinking running VM disk
Eugene Istomin
E.Istomin at edss.ee
Wed May 20 15:33:43 PDT 2015
Goldwyn,
thanks for the answer!
I read
https://oss.oracle.com/osswiki/OCFS2(2f)DesignDocs(2f)RefcountTrees.html
carefully to understand the problem.
As i understand:
There are B-Tree structures for reflink: ocfs2_refcount_tree;
ocfs2_refcount_block -> ocfs2_refcount_list -> ocfs2_refcount_rec
"The refcount tree root is a refcount block pointed to by i_refcount_loc"
Some operations needs extra uncached lookups
Also i dumped frag/stat/refcount from production hypervisor node using
debugfs.ocfs2, files are in attach (url as alt way -
http://public.edss.ee/tmp/debugfs.tar.gz ).
Hypervisor OCFS2 mount options:
rw,nosuid,noexec,noatime,heartbeat=none,nointr,data=ordered,errors=remount-
ro,localalloc=2048,coherency=full,user_xattr,acl
Mkfs string:
mkfs.ocfs2 -b 4KB -C 1MB -N 2 -T vmstore -L "storage" --fs-
features=local,backup-super,sparse,unwritten,inline-
data,metaecc,refcount,xattr,indexed-dirs,discontig-bg
Can you please explain why there are so many extent blocks (204)? Is it really
impossible to store plenty of clusters in single extent (like #25, block
3874095 -> 20847 clusters)?
--
Best regards,
Eugene Istomin
IT Architect
On Monday, May 18, 2015 12:45:40 PM Goldwyn Rodrigues wrote:
> Hi Eugene,
>
> Sorry, had been busy with other work and this slipped on the list.
>
> > > Do you know something about such behavior?
> > >
> > > The question is why a reflink operation on VM disk leads to plenty of
> >
> > read
> >
> > > ops? Is this related to CoW specific structures?
>
> This is in fact related to the CoW. An ocfs2 file is an extent tree,
> which the extent headers marking if the extent is a reflinked or not
> with the number of reflinks.
>
> If you perform a reflink on a file which is being changed constantly,
> not only recreate the extent tree, but also decrease the refcount of the
> ones already present. Add to it, the extents which need to be read for
> replication.
>
>
> HTH,
>
> > > We can provide others details & ssh to testbed.
> > >
> > > > Hello,
> > > >
> > > >
> > > >
> > > > after deploying reflink-based VM snapshots to production servers we
> > > >
> > > > discovered a performace degradation:
> > > >
> > > >
> > > >
> > > > OS: Opensuse 13.1, 13.2
> > > >
> > > > Hypervisors: Xen 4.4, 4.5
> > > >
> > > > Dom0 kernels: 3.12, 3.16, 3.18
> > > >
> > > > DomU kernels: 3.12, 3.16, 3.18
> > > >
> > > > Tested DomU disk backends: tapdisk2, qdisk
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > 1) on DomU (VM)
> > > >
> > > > #dd if=/dev/zero of=test2 bs=1M count=6000
> > > >
> > > >
> > > >
> > > > 2) atop on Dom0:
> > > >
> > > > sdb - busy:92% - read:375 - write:130902
> > > >
> > > > Reads are from others VMs, seems OK
> > > >
> > > >
> > > >
> > > > 3) DomU dd finished:
> > > >
> > > > 6291456000 bytes (6.3 GB) copied, 16.6265 s, 378 MB/s
> > > >
> > > >
> > > >
> > > > 4) Lets start dd again & do a snapshot:
> > > >
> > > > #dd if=/dev/zero of=test2 bs=1M count=6000
> > > >
> > > > #reflink test.raw ref/
> > > >
> > > >
> > > >
> > > > 5) atop on Dom0:
> > > >
> > > > sdb - busy:97% - read:112740 - write:28037
> > > >
> > > > So, Read IOPS = 112740, why?
> > > >
> > > >
> > > >
> > > > 6) DomU dd finished:
> > > >
> > > > 6291456000 bytes (6.3 GB) copied, 175.45 s, 35.9 MB/s
> > > >
> > > >
> > > >
> > > > 7) Second & further reflinks do not change the atop stat & dd time
> > > >
> > > > #dd if=/dev/zero of=test2 bs=1M count=6000
> > > >
> > > > #reflink --backup=t test.raw ref/ \\ * n times
> > > >
> > > > ~ 6291456000 bytes (6.3 GB) copied, 162.959 s, 38.6 MB/s
> > > >
> > > >
> > > >
> > > > The question is why reflinking a running VM disk leads to read IOPS
> >
> > storm?
> >
> > > > Thanks!
> > >
> > > _______________________________________________
> > >
> > > Ocfs2-devel mailing list
> > >
> > > Ocfs2-devel at oss.oracle.com
> > >
> > > https://oss.oracle.com/mailman/listinfo/ocfs2-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20150521/23cd43e2/attachment-0001.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: debugfs.tar.gz
Type: application/x-compressed-tar
Size: 729820 bytes
Desc: not available
Url : http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20150521/23cd43e2/attachment-0001.bin
More information about the Ocfs2-devel
mailing list