[Ocfs2-users] OCFS2 hanging on writes

Herbert van den Bergh herbert.van.den.bergh at oracle.com
Thu Oct 25 18:39:31 PDT 2012


Hello Jeff,

You might want to check what the writer process is waiting on when it's 
frozen.  The wchan column of ps might be enough, but if not, then 
perhaps a kernel stack trace of the process from /proc/<pid>/stack or 
from echo t > /proc/sysrq-trigger .  The latter will show other blocked 
processes as well, which may be helpful in determining the cause of the 
freeze.

Thanks,
Herbert.


On 10/25/2012 06:32 PM, Jeff Paterson wrote:
> Hello,
>
> I would need help with our OCFS2 (1.8.0) filesystem.  We are having 
> problems with it since a couple days.  When we write onto it, it hangs.
>
> The "hanging pattern" is easily reproductible.  If I write a 1GB file 
> on the filesystem, it does the following:
>         - write ~200 MB of data on the disk in 1 second
>         - freeze for about 10 seconds
>         - write ~200 MB of data on the disk in 1 second
>         - freeze for about 10 seconds
>         - write ~200 MB of data on the disk in 1 second
>         - freeze for about 10 seconds
>         (and so on)
>
> When the freezes occur:
>         - other writes operations (from other processes) on the same 
> node also freeze
>         - writes operations on other nodes are not affected by the 
> freezes on another node
> Read operations (on any cluster node, even the one with frozen writes) 
> don't seem to be affected by the freezes.  One sure thing, read 
> operations alone don't cause the filesystem freeze.
>
> For info, before the problem began to appear we could sustain 640 MB/s 
> writes without any freeze.
>
> I tried to mount the filesystem on a single node to avoid issues that 
> could happen with inter-node communications and the problem was still 
> there.
>
>
> *_Filesystem details_*
>
>   * The filesystem has 18 TB and it is currently 72% full.
>   * Mount options are the following:
>     rw,nodev,_netdev,noatime,errors=panic,data=writeback,noacl,nouser_xattr,commit=60,heartbeat=local
>   * All Features: backup-super strict-journal-super sparse
>     extended-slotmap inline-data metaecc indexed-dirs refcount
>     discontig-bg unwritten
>
>
>
> There is nothing special in the systems logs beside application errors 
> caused by the freezes.
>
>
> Would a fsck.ocfs2 help?   How long would it take for 18 TB?
>
> Is there a flag I can enable in debugfs.ocfs2 to get a better idea of 
> what is happening and why it is freezing like that?
>
>
> Any help would be greatly appreciated.
>
> Thanks in advance,
>
> Jeff
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20121025/aa844a3b/attachment.html 


More information about the Ocfs2-users mailing list