[Ocfs2-users] The ongoing mystery of the ocfs2 memory leak

John Lange john.lange at open-it.ca
Fri Mar 23 12:23:12 PDT 2007


If you have been watching this list you may have seen my postings about
some kind of memory leak when using ocfs2.

This is a problem that is still not solved and I'm hoping someone one
the list can help us isolate the issue.

The circumstances are very strange; After much analysis and testing what
we have been able to figure out is that there is a 400Meg drop in memory
that happens every day between 6:45am and 7:45am. This memory is never
recovered and after about 3-4 days the node starts killing processes
(oom-killer) until it self-destructs.

Now you are probably thinking (as we were) that this is some kind of
cron that kicks in at that time and causes the problem but that is not
the case. For one thing, daily cron does not run at that time. And
secondly, we logged all processes to a file every 15 minutes and then
compared what was running before the memory loss to what was running
during and after the memory loss and there is nothing new running!

And when we analyze the slabinfo for the same period there is nothing
that is taking a corresponding (400M) jump in size during the same time
period.

So where the heck is our memory going?!?

Does anyone have a clue how we can diagnose this?

Currently we are capturing vmstat, slabinfo, and full process list at 15
minute intervals. Is there anything else we could be logging?

Thanks,

John Lange





More information about the Ocfs2-users mailing list