[Ocfs2-users] Tracking down hangs

Sunil Mushran sunil.mushran at oracle.com
Thu Jun 3 14:18:53 PDT 2010


If scanlocks is clean, means it is not a dlm issue.

Have you tried mounting with data=writeback? With drbd,
a 1G write becomes a 2G write. With ordered mode, a journal
checkpoint, which is done when relinquishing a write lock, will
wait on the data flush. That could be the cause for the slowdown.
Does drbd have any way to see how active it is at that time? If
so, monitor that.

BTW, readonly does not mean no-cache-coherency. It only means
that the userspace cannot write. But the fs is fully-cache-coherent
at all times. So there is no advantage performance wise.

On 06/03/2010 03:12 AM, Andrew Robert Nicols wrote:
> We're using a storage solution involving two SunFire X4500 servers using
> DRBD to replicate a 15TB partition across the network with ocfs2 on top.
> We're sharing the partition from one server over NFS and the other is
> mounted read-only at present.  The DBRD backing store is software RAID 60
> on 40 disks.
>
> We've been seeing periodic issues whereby our NFS clients (Debian Lenny)
> are very slow to perform simple operations such as cat a 4 character file,
> or perform an ls.
>
> This affects all of the NFS clients at the same time and typically lasts
> from between a few seconds to maybe 2 minutes. Operation then continues as
> normal and service resumes. We've also seen this affecting the read-only
> server which has the ocfs2 partition mounted.
>
> We've been having trouble trying to find out the cause of the issues. but
> can reliably reproduce such failures as follows:
>
> On each host, check for cats taking longer than 1 second:
> while true; do time cat /srv/healthchecks/smallfile>  /dev/null; done 2>&1 | awk '/m[1-9]/ {print strftime(), $_}'
>
> To actually reproduce the failure, we then run a dd on the filestore:
> dd if=/dev/zero of=/srv/test/dd-test-`date +%s` bs=1M count=1000&&  echo "Syncing"&&  time sync
>
> At the time that the sync finishes, all of the NFS clients and the
> read-only server show that it took some time to return the cat of an
> unrelated file - usually the same amount of time it took to run the sync.
>
> What's the best place to start looking for the cause of these hangs? I've
> attached the dmesg output which includes some call traces for hung threads.
> I have stat_sysdir output though I suspect it's not so relevant. A
> scanlocks output doesn't reveal any busy locks that I can see (unless I'm
> not hitting it at the right time or misreading the output).
>
> For the DRBD replication there's a pair of bonded GBit NICs dedicated to
> the job. The other two GBit bonded NICs in the boxes are being used for NFS
> and o2net/ocfs communication. We don't believe that the network is at
> fault.
>
> We're using Debian Lenny with the stock AMD64 kernel out of the Lenny
> repository - 2.6.26-2-amd64.
> We're using ocfs2 tools version 1.4.4 which we have packaged for Debian ourselves.
>
> The ocfs2 version reported in /sys/module/ocfs2/version is 1.5.0.
>
> Here's the current o2cb configuration:
>
> root at thumper5:#/srv/test# /etc/init.d/o2cb status
> Driver for "configfs": Loaded
> Filesystem "configfs": Mounted
> Stack glue driver: Loaded
> Stack plugin "o2cb": Loaded
> Driver for "ocfs2_dlmfs": Loaded
> Filesystem "ocfs2_dlmfs": Mounted
> Checking O2CB cluster thumperpool: Online
> Heartbeat dead threshold = 61
>    Network idle timeout: 60000
>    Network keepalive delay: 2000
>    Network reconnect delay: 2000
> Checking O2CB heartbeat: Active
>
> We also tried a Heartbeat dead threshold of 31 with a Network idle timeout
> of 30000 to the same effect.
>
> Any assistance would be very much appreciated,
>
> Andrew
>
>    
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20100603/35206b66/attachment.html 


More information about the Ocfs2-users mailing list