[Ocfs2-users] ocfs2 is still eating memory

John Lange john.lange at open-it.ca
Mon Mar 12 10:42:12 PDT 2007


Alexei, thank you for your offer but I don't think it will be necessary.

After the initial problems we had getting the attention of support,
Novell has been going above and beyond on this issue and we have
recently had some calls with Novell Engineers who are diligently working
on the problem.

So far they confirm that they have been able to replicate this issue in
the lab and are working on isolating the root cause so a fix can be
done.

We could not ask for anything more and I have only good things to say
about their response.

Regards,

John

On Mon, 2007-03-12 at 10:04 -0700, Alexei_Roudnev wrote:
> I can try to replicate it too, if you want. Just testing SLES10 in the lab
> and so have some hardware.
> 
> What you should do is to do _the same_ with SLES9 SP3, then  SLES10 -
> release +updates, and then SLES10 SP1 Beta, and compare results (orther
> choice is to build OCFSv2 from the last sources manually).
> I can recall such problem, which I saw long ago, but it was fixed.
> 
> ----- Original Message ----- 
> From: "John Lange" <john.lange at open-it.ca>
> To: "Sunil Mushran" <Sunil.Mushran at oracle.com>
> Cc: <wim.coekaerts at oracle.com>; "Alexei_Roudnev"
> <Alexei_Roudnev at exigengroup.com>; "ocfs2-users"
> <ocfs2-users at oss.oracle.com>; "Lars Marowsky-Bree" <lmb at suse.de>
> Sent: Saturday, March 10, 2007 4:49 PM
> Subject: Re: [Ocfs2-users] ocfs2 is still eating memory
> 
> 
> > One thing that has been of concern is that duplicating our environment
> > is a challenge because of the size of the installation (4 nodes, 12
> > Terabytes, Fiber SAN etc.)
> >
> > I decided to take some time this weekend to dust off a machine from my
> > collection and see if I could replicate the issue in the lab on a much
> > smaller and easier to duplicate platform.
> >
> > I am happy to report that I have been successful.
> >
> > Replicating the issue takes nothing more than a base SLES10 install on
> > standard hardware with a small ocfs2 partition.
> >
> > Here are the steps:
> >
> > 1) build a SLES10 server (I used the SLES10 Evaluation DVD).
> > - Create 3 hard drive partitions:
> >   /dev/hda1 100M  ext3 /boot
> >   /dev/hdb2 15G   ext3 /
> >   /dev/hdb3 97G   (primary partition, type linux, leave unmounted)
> >
> > - apply all online updates
> >
> > 2) Use ocfs2console to create a single node (local machine only).
> >
> > 3) invoke these commands on the command line to get ocfs2 running:
> >
> > # /etc/init.d/o2cb config
> >
> > Accept all the defaults. (this is probably not a required step)
> >
> > # /etc/init.d/o2cb force-reload
> >
> > Ensure there are no errors and then:
> >
> > # mkfs.ocfs2 /dev/hda3
> > # mkdir /data
> > # mount -t ocfs2 /dev/hda3 /data
> >
> > 4) create the following script and make it executable
> >
> > # joe /root/bin/populateocfs.sh
> >
> > --- cut ---
> >
> > #!/bin/bash
> > dd if=/dev/urandom of=/tmp/1Mfile count=1024 bs=1024
> > COUNT1=0
> > while [ "$COUNT1" -lt 1000 ]
> >   do
> >     cp -v /tmp/1Mfile /data/$COUNT1.1Mfile
> >   (( COUNT1++ ))
> > done
> >
> > --- cut ---
> >
> > # chmod 700 /root/bin/populateocfs.sh
> >
> > 5) In a separate shell I recommend invoking:
> >
> > # vmstat 1
> >
> > so you can watch the memory plummet.
> >
> > 6) Now invoke the script:
> >
> > # /root/bin/populateocfs.sh
> >
> > ---------------
> >
> > That's it. Free memory will drop fast. On my test system I never got
> > past about 200 files before I ran out of RAM.
> >
> > There are 2 methods to recover the trapped memory. Either tell the
> > kernel to flush the caches, or delete all the files in the /data/
> > directory:
> >
> > 1) # echo 3 > /proc/sys/vm/drop_caches
> >
> > or
> >
> > 2) # rm -R /data/[0-9]*
> >
> > -------------
> >
> > Let me know if you have any questions.
> >
> > John
> >
> >
> > On Fri, 2007-03-09 at 17:24 -0800, Sunil Mushran wrote:
> > > Not to feed fuel to this thread, just wanted to mention that we will are
> in
> > > the process of reimaging some of our test boxes and should be able
> > > to test John's issue sometime next week.
> > >
> > > wim.coekaerts at oracle.com wrote:
> > > > I think sunil just wants to ensure that if something is urgent and
> production its good to get formal support and filing bugs for stuff in
> bugzilla is looked at for sure but might not get the same priority  .
> Nothing more.
> > > >
> > > > Note also that the guys work hard on making this product good. And
> they do their very best to do it the right way. I must say that sometimes
> language is a hit harsh and disrespectful to say the least. Which is never
> really appreciated.  A little bit of respect and more constructive feedback
> usually goes a very long way. Everyone is tryong their best.
> > > >
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: "Alexei_Roudnev" <Alexei_Roudnev at exigengroup.com>
> > > > To: "Sunil Mushran" <Sunil.Mushran at oracle.com>; "John Lange"
> <john.lange at open-it.ca>
> > > > Cc: "ocfs2-users" <ocfs2-users at oss.oracle.com>; "Lars Marowsky-Bree"
> <lmb at suse.de>
> > > > Sent: 3/8/07 5:36 PM
> > > > Subject: Re: [Ocfs2-users] ocfs2 is still eating memory
> > > >
> > > > Sunil, you DONT UNDERSTAND.
> > > >
> > > > They DONT ASK for the SUPPORT. They ask, "HOW WE CAN REPORT A BUG?"
> > > >
> > > > I had the same problem many times - there IS NOT a simple way to
> report a
> > > > bug to the Novell. Not a surprise that
> > > > systems are so buggy. For the beta versions, if you are not signed as
> a beta
> > > > tester, then there is not easy way to report bug, too.
> > > >
> > > > (Signing as a beta tester means many obligations, and what if I am
> going to
> > > > test a few components only?)
> > > >
> > > > For now, I am testing SP1 Beta4 (I have access as a partner so that we
> can
> > > > test new software before it is released). I never asked for suppirt,
> but I
> > > > saw a bugs many many times, and each time when I tried to report it
> > > > (possible or absolute bug), it was a headache. Let's take open-iscsi -
> > > > it require few small improvements for 100% sure, I can test them in
> our
> > > > case, many other users can test them too, but we CAN;T REPORT them.
> > > >
> > > > (Just to have a list:
> > > > - lvm is not called after the iscsi, so it don't see open-iscsi. On
> the
> > > > other hand, having multiport support dropped in new iSCSI means that
> > > > you can't use human readable names from disk/by-path but must use
> multipath
> > > > disk ID's, so the only way to do it is lvm - but lvm is
> > > > not called after iSCSI;
> > > > - documentation have numerous bugs and don't explain how to mount
> iSCSI
> > > > disks (SuSe dropped netfs and did not add anything instead);
> > > > - few actions require timeouts; for example, you should wait 5 - 10
> seconds
> > > > after discovery and before conenction.
> > > >
> > > > )
> > > >
> > > > They have the same problem. tested version, something don't work, call
> > > > support _We see a problem_ , and got response _you have not premium
> support
> > > > so we wil not talk_ (next time when I see your home burning, I call
> you and
> > > > you say _dont telemarket me, hang on_ instead of _what's the matter?
> OI,
> > > > it's serious. let's look together).
> > > >
> > > > The sad story is that many of these bugs are easy to fix, and that
> system
> > > > itself is excellent... but quality is far from production grade, and
> the
> > > > futher the worst.
> > > >
> > > >
> > > > ----- Original Message ----- 
> > > > From: "Sunil Mushran" <Sunil.Mushran at oracle.com>
> > > > To: "John Lange" <john.lange at open-it.ca>
> > > > Cc: "ocfs2-users" <ocfs2-users at oss.oracle.com>; "Lars Marowsky-Bree"
> > > > <lmb at suse.de>
> > > > Sent: Thursday, March 08, 2007 4:37 PM
> > > > Subject: Re: [Ocfs2-users] ocfs2 is still eating memory
> > > >
> > > >
> > > >
> > > >> If you are running a prod shop, you should looking into buying
> support.
> > > >>
> > > >> John Lange wrote:
> > > >>
> > > >>> On Mon, 2007-03-05 at 13:46 -0800, Sunil Mushran wrote:
> > > >>>
> > > >>>
> > > >>>> Well, kswapd is supposed to flush the caches. As in, the vm
> > > >>>> controls the lifetime of the inodes in the inode_cache not ocfs2.
> > > >>>>
> > > >>>> All ocfs2 can do is free the memory associated with the inode when
> > > >>>> asked to. And it does that when you manually flush the cache. Qs is
> > > >>>> why the vm is not doing it on its own. (fwiw, you are on a beta
> > > >>>>
> > > > kernel.)
> > > >
> > > >>> We are using beta kernels in an attempt to solve this problem. As
> > > >>> everyone knows, the most recent official SUSE kernel (2.6.16.21-0.25
> i
> > > >>> believe?) completely broke ocfs2. Downgrading to 2.6.16.21-0.15
> solves
> > > >>> that problem but the memory issue remains.
> > > >>>
> > > >>> So as far as I am aware, there is no SUSE kernel that works with
> ocfs2
> > > >>> which is where we find ourselves today.
> > > >>>
> > > >>> I just upgraded to the latest KOTD:
> > > >>>
> > > >>> 2.6.16.42-SLES10_SP1_BRANCH_20070307114604-smp
> > > >>>
> > > >>> And still, when running ocfs2, all ram gets consumed.
> > > >>>
> > > >>> Right now Novell is playing the "you don't have premium support"
> game so
> > > >>> where should I report this bug?
> > > >>>
> > > >>> Regards,
> > > >>>
> > > >>> John Lange
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >> _______________________________________________
> > > >> Ocfs2-users mailing list
> > > >> Ocfs2-users at oss.oracle.com
> > > >> http://oss.oracle.com/mailman/listinfo/ocfs2-users
> > > >>
> > > >>
> > > >
> > > >
> > > > _______________________________________________
> > > > Ocfs2-users mailing list
> > > > Ocfs2-users at oss.oracle.com
> > > > http://oss.oracle.com/mailman/listinfo/ocfs2-users
> > > >
> > > >
> > > >
> > > >
> > >
> >
> >
> 




More information about the Ocfs2-users mailing list