[Ocfs2-users] ocfs2 is still eating memory

Alexei_Roudnev Alexei_Roudnev at exigengroup.com
Mon Mar 12 10:04:34 PDT 2007


I can try to replicate it too, if you want. Just testing SLES10 in the lab
and so have some hardware.

What you should do is to do _the same_ with SLES9 SP3, then  SLES10 -
release +updates, and then SLES10 SP1 Beta, and compare results (orther
choice is to build OCFSv2 from the last sources manually).
I can recall such problem, which I saw long ago, but it was fixed.

----- Original Message ----- 
From: "John Lange" <john.lange at open-it.ca>
To: "Sunil Mushran" <Sunil.Mushran at oracle.com>
Cc: <wim.coekaerts at oracle.com>; "Alexei_Roudnev"
<Alexei_Roudnev at exigengroup.com>; "ocfs2-users"
<ocfs2-users at oss.oracle.com>; "Lars Marowsky-Bree" <lmb at suse.de>
Sent: Saturday, March 10, 2007 4:49 PM
Subject: Re: [Ocfs2-users] ocfs2 is still eating memory


> One thing that has been of concern is that duplicating our environment
> is a challenge because of the size of the installation (4 nodes, 12
> Terabytes, Fiber SAN etc.)
>
> I decided to take some time this weekend to dust off a machine from my
> collection and see if I could replicate the issue in the lab on a much
> smaller and easier to duplicate platform.
>
> I am happy to report that I have been successful.
>
> Replicating the issue takes nothing more than a base SLES10 install on
> standard hardware with a small ocfs2 partition.
>
> Here are the steps:
>
> 1) build a SLES10 server (I used the SLES10 Evaluation DVD).
> - Create 3 hard drive partitions:
>   /dev/hda1 100M  ext3 /boot
>   /dev/hdb2 15G   ext3 /
>   /dev/hdb3 97G   (primary partition, type linux, leave unmounted)
>
> - apply all online updates
>
> 2) Use ocfs2console to create a single node (local machine only).
>
> 3) invoke these commands on the command line to get ocfs2 running:
>
> # /etc/init.d/o2cb config
>
> Accept all the defaults. (this is probably not a required step)
>
> # /etc/init.d/o2cb force-reload
>
> Ensure there are no errors and then:
>
> # mkfs.ocfs2 /dev/hda3
> # mkdir /data
> # mount -t ocfs2 /dev/hda3 /data
>
> 4) create the following script and make it executable
>
> # joe /root/bin/populateocfs.sh
>
> --- cut ---
>
> #!/bin/bash
> dd if=/dev/urandom of=/tmp/1Mfile count=1024 bs=1024
> COUNT1=0
> while [ "$COUNT1" -lt 1000 ]
>   do
>     cp -v /tmp/1Mfile /data/$COUNT1.1Mfile
>   (( COUNT1++ ))
> done
>
> --- cut ---
>
> # chmod 700 /root/bin/populateocfs.sh
>
> 5) In a separate shell I recommend invoking:
>
> # vmstat 1
>
> so you can watch the memory plummet.
>
> 6) Now invoke the script:
>
> # /root/bin/populateocfs.sh
>
> ---------------
>
> That's it. Free memory will drop fast. On my test system I never got
> past about 200 files before I ran out of RAM.
>
> There are 2 methods to recover the trapped memory. Either tell the
> kernel to flush the caches, or delete all the files in the /data/
> directory:
>
> 1) # echo 3 > /proc/sys/vm/drop_caches
>
> or
>
> 2) # rm -R /data/[0-9]*
>
> -------------
>
> Let me know if you have any questions.
>
> John
>
>
> On Fri, 2007-03-09 at 17:24 -0800, Sunil Mushran wrote:
> > Not to feed fuel to this thread, just wanted to mention that we will are
in
> > the process of reimaging some of our test boxes and should be able
> > to test John's issue sometime next week.
> >
> > wim.coekaerts at oracle.com wrote:
> > > I think sunil just wants to ensure that if something is urgent and
production its good to get formal support and filing bugs for stuff in
bugzilla is looked at for sure but might not get the same priority  .
Nothing more.
> > >
> > > Note also that the guys work hard on making this product good. And
they do their very best to do it the right way. I must say that sometimes
language is a hit harsh and disrespectful to say the least. Which is never
really appreciated.  A little bit of respect and more constructive feedback
usually goes a very long way. Everyone is tryong their best.
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: "Alexei_Roudnev" <Alexei_Roudnev at exigengroup.com>
> > > To: "Sunil Mushran" <Sunil.Mushran at oracle.com>; "John Lange"
<john.lange at open-it.ca>
> > > Cc: "ocfs2-users" <ocfs2-users at oss.oracle.com>; "Lars Marowsky-Bree"
<lmb at suse.de>
> > > Sent: 3/8/07 5:36 PM
> > > Subject: Re: [Ocfs2-users] ocfs2 is still eating memory
> > >
> > > Sunil, you DONT UNDERSTAND.
> > >
> > > They DONT ASK for the SUPPORT. They ask, "HOW WE CAN REPORT A BUG?"
> > >
> > > I had the same problem many times - there IS NOT a simple way to
report a
> > > bug to the Novell. Not a surprise that
> > > systems are so buggy. For the beta versions, if you are not signed as
a beta
> > > tester, then there is not easy way to report bug, too.
> > >
> > > (Signing as a beta tester means many obligations, and what if I am
going to
> > > test a few components only?)
> > >
> > > For now, I am testing SP1 Beta4 (I have access as a partner so that we
can
> > > test new software before it is released). I never asked for suppirt,
but I
> > > saw a bugs many many times, and each time when I tried to report it
> > > (possible or absolute bug), it was a headache. Let's take open-iscsi -
> > > it require few small improvements for 100% sure, I can test them in
our
> > > case, many other users can test them too, but we CAN;T REPORT them.
> > >
> > > (Just to have a list:
> > > - lvm is not called after the iscsi, so it don't see open-iscsi. On
the
> > > other hand, having multiport support dropped in new iSCSI means that
> > > you can't use human readable names from disk/by-path but must use
multipath
> > > disk ID's, so the only way to do it is lvm - but lvm is
> > > not called after iSCSI;
> > > - documentation have numerous bugs and don't explain how to mount
iSCSI
> > > disks (SuSe dropped netfs and did not add anything instead);
> > > - few actions require timeouts; for example, you should wait 5 - 10
seconds
> > > after discovery and before conenction.
> > >
> > > )
> > >
> > > They have the same problem. tested version, something don't work, call
> > > support _We see a problem_ , and got response _you have not premium
support
> > > so we wil not talk_ (next time when I see your home burning, I call
you and
> > > you say _dont telemarket me, hang on_ instead of _what's the matter?
OI,
> > > it's serious. let's look together).
> > >
> > > The sad story is that many of these bugs are easy to fix, and that
system
> > > itself is excellent... but quality is far from production grade, and
the
> > > futher the worst.
> > >
> > >
> > > ----- Original Message ----- 
> > > From: "Sunil Mushran" <Sunil.Mushran at oracle.com>
> > > To: "John Lange" <john.lange at open-it.ca>
> > > Cc: "ocfs2-users" <ocfs2-users at oss.oracle.com>; "Lars Marowsky-Bree"
> > > <lmb at suse.de>
> > > Sent: Thursday, March 08, 2007 4:37 PM
> > > Subject: Re: [Ocfs2-users] ocfs2 is still eating memory
> > >
> > >
> > >
> > >> If you are running a prod shop, you should looking into buying
support.
> > >>
> > >> John Lange wrote:
> > >>
> > >>> On Mon, 2007-03-05 at 13:46 -0800, Sunil Mushran wrote:
> > >>>
> > >>>
> > >>>> Well, kswapd is supposed to flush the caches. As in, the vm
> > >>>> controls the lifetime of the inodes in the inode_cache not ocfs2.
> > >>>>
> > >>>> All ocfs2 can do is free the memory associated with the inode when
> > >>>> asked to. And it does that when you manually flush the cache. Qs is
> > >>>> why the vm is not doing it on its own. (fwiw, you are on a beta
> > >>>>
> > > kernel.)
> > >
> > >>> We are using beta kernels in an attempt to solve this problem. As
> > >>> everyone knows, the most recent official SUSE kernel (2.6.16.21-0.25
i
> > >>> believe?) completely broke ocfs2. Downgrading to 2.6.16.21-0.15
solves
> > >>> that problem but the memory issue remains.
> > >>>
> > >>> So as far as I am aware, there is no SUSE kernel that works with
ocfs2
> > >>> which is where we find ourselves today.
> > >>>
> > >>> I just upgraded to the latest KOTD:
> > >>>
> > >>> 2.6.16.42-SLES10_SP1_BRANCH_20070307114604-smp
> > >>>
> > >>> And still, when running ocfs2, all ram gets consumed.
> > >>>
> > >>> Right now Novell is playing the "you don't have premium support"
game so
> > >>> where should I report this bug?
> > >>>
> > >>> Regards,
> > >>>
> > >>> John Lange
> > >>>
> > >>>
> > >>>
> > >>>
> > >> _______________________________________________
> > >> Ocfs2-users mailing list
> > >> Ocfs2-users at oss.oracle.com
> > >> http://oss.oracle.com/mailman/listinfo/ocfs2-users
> > >>
> > >>
> > >
> > >
> > > _______________________________________________
> > > Ocfs2-users mailing list
> > > Ocfs2-users at oss.oracle.com
> > > http://oss.oracle.com/mailman/listinfo/ocfs2-users
> > >
> > >
> > >
> > >
> >
>
>




More information about the Ocfs2-users mailing list