[Ocfs2-users] Slow performance

Mon Sep 5 14:21:42 PDT 2011

Hello again,

We have hit some performance problem today in one of our clusters. The
performance suddenly drop from the normal performance (about
30Mbytes/s), read/write, to a few Kbytes/s (about 200Kbytes/s), read
only, for a while, and as sudden as it started, it backs to the normal
read/write performance, cycling randomly. When the "read only" occurs
on one node, the other shows only the heartbeat activity (about 2 io's
per 2 seconds) until the first back to normal and vice-versa.

The servers are running e-mail application (IMAP/POP/SMTP -- Maildir
format) with more than 20.000 user, so they are constantly creating,
removing and moving files.

Dumping the processes in D state when the server is in "constant few
kbytes read only state", they look like:

node#0:
10739 D    imapd           ocfs2_lookup_lock_orphan_dir
11658 D    imapd           ocfs2_reserve_suballoc_bits
12326 D    imapd           ocfs2_lookup_lock_orphan_dir
12330 D    pop3d           lock_rename
12351 D    imapd           ocfs2_lookup_lock_orphan_dir
12357 D    imapd           ocfs2_lookup_lock_orphan_dir
12359 D    imapd           unlinkat
12381 D    imapd           ocfs2_lookup_lock_orphan_dir
12498 D    deliverquota    ocfs2_wait_for_mask
12710 D    pop3d           ocfs2_reserve_suballoc_bits
12712 D    imapd           unlinkat
12726 D    imapd           ocfs2_reserve_suballoc_bits
12730 D    imapd           unlinkat
12736 D    imapd           ocfs2_reserve_suballoc_bits
12738 D    imapd           unlinkat
12749 D    pop3d           lock_rename
12891 D    pop3d           ocfs2_reserve_suballoc_bits
12971 D    pop3d           mutex_fastpath_lock_retval
12985 D    pop3d           lock_rename
13006 D    deliverquota    ocfs2_reserve_suballoc_bits
13061 D    pop3d           lock_rename
13117 D    pop3d           lock_rename
[-- suppressed --]
100+ processes in D state

node#1:
24428 D    deliverquota    ocfs2_wait_for_mask

Some stacktraces from the processes:

Call Trace:
 [<ffffffff81437e31>] __mutex_lock_common+0x12f/0x1a1
 [<ffffffff81437ef2>] __mutex_lock_slowpath+0x19/0x1b
 [<ffffffff81437f5b>] mutex_lock+0x23/0x3a
 [<ffffffffa065ba1f>] ocfs2_lookup_lock_orphan_dir+0xb8/0x18a [ocfs2]
 [<ffffffffa065c7d5>] ocfs2_prepare_orphan_dir+0x3f/0x229 [ocfs2]
 [<ffffffffa0660bab>] ocfs2_unlink+0x523/0xa81 [ocfs2]
 [<ffffffff810425b3>] ? need_resched+0x23/0x2d
 [<ffffffff810425cb>] ? should_resched+0xe/0x2f
 [<ffffffff810425cb>] ? should_resched+0xe/0x2f
 [<ffffffff8116324d>] ? dquot_initialize+0x126/0x13d
 [<ffffffff810425b3>] ? need_resched+0x23/0x2d
 [<ffffffff81122c0c>] vfs_unlink+0x82/0xd1
 [<ffffffff81124bcc>] do_unlinkat+0xc6/0x178
 [<ffffffff8112186b>] ? path_put+0x22/0x27
 [<ffffffff810a7d03>] ? audit_syscall_entry+0x103/0x12f
 [<ffffffff81124c94>] sys_unlink+0x16/0x18
 [<ffffffff81011db2>] system_call_fastpath+0x16/0x1b

Call Trace:
 [<ffffffff81437e31>] __mutex_lock_common+0x12f/0x1a1
 [<ffffffffa0633682>] ? ocfs2_match+0x2c/0x3a [ocfs2]
 [<ffffffff81437ef2>] __mutex_lock_slowpath+0x19/0x1b
 [<ffffffff81437f5b>] mutex_lock+0x23/0x3a
 [<ffffffffa0676a82>] ocfs2_reserve_suballoc_bits+0x11a/0x499 [ocfs2]
 [<ffffffffa0678b4c>] ocfs2_reserve_new_inode+0x134/0x37a [ocfs2]
 [<ffffffffa065d409>] ocfs2_mknod+0x2d4/0xf26 [ocfs2]
 [<ffffffffa063d02c>] ? ocfs2_should_refresh_lock_res+0x8f/0x1ad [ocfs2]
 [<ffffffffa0653cf6>] ? ocfs2_wait_for_recovery+0x1a/0x8f [ocfs2]
 [<ffffffff81437f4e>] ? mutex_lock+0x16/0x3a
 [<ffffffffa065e0fd>] ocfs2_create+0xa2/0x10a [ocfs2]
 [<ffffffff8112268f>] vfs_create+0x7e/0x9d
 [<ffffffff81125794>] do_filp_open+0x302/0x92d
 [<ffffffff810425cb>] ? should_resched+0xe/0x2f
 [<ffffffff81437731>] ? _cond_resched+0xe/0x22
 [<ffffffff81238109>] ? might_fault+0xe/0x10
 [<ffffffff812381f3>] ? __strncpy_from_user+0x20/0x4a
 [<ffffffff81114bc8>] do_sys_open+0x62/0x109
 [<ffffffff81114ca2>] sys_open+0x20/0x22
 [<ffffffff81011db2>] system_call_fastpath+0x16/0x1b

Checking the bz, these two bugs seems to have similar behavior:

http://oss.oracle.com/bugzilla/show_bug.cgi?id=1281
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1300

on the mailing list archive, this thread also shows similar behavior:

http://www.mail-archive.com/ocfs2-users@oss.oracle.com/msg02509.html

The cluster is formed by two Dell PE 1950 with 8G ram, attached via
2Gbit FC to a Dell EMC AX/100 storage. The network between them is
running at 1Gbit.

Using CenOS 5.5, OCFS2 1.6.4 and ULEK 2.6.32-100.0.19.el5.

Tests so far:

* We have changed mount option data from ordered to writeback -- no
  success;
* We have added mount option localalloc=16 -- no success;
* We have turned off group and user quota support -- no success;
* Rebooted the servers (to test with everything fresh) -- no success;
* Mounted the filesystem only in one node -- success;

The problem does not show when you mount the filesystem on only one
node, so we are currently working around by exporting the filesystem
via NFS, leading me to conclude that the lock is inside the cluster
stack (dlm or something).

We have checked logs, debugs, traces trying to pinpoint the problem,
but with no success. Any clue on how to debug further or if it is the
same problem as the ones on the cited bug reports?

The node#0 has heavier I/O load than node#1, could it trigger something?

The filesystem is about 94% full (751G of 803G).

Thanks!

Regards,
-- 
  .:''''':.
.:'        `     Sérgio Surkamp | Gerente de Rede
::    ........   sergio at gruposinternet.com.br
`:.        .:'
  `:,   ,.:'     *Grupos Internet S.A.*
    `: :'        R. Lauro Linhares, 2123 Torre B - Sala 201
     : :         Trindade - Florianópolis - SC
     :.'
     ::          +55 48 3234-4109
     :
     '           http://www.gruposinternet.com.br