[Ocfs2-devel] Deadlock in DLM code still there
Jan Kara
jack at suse.cz
Thu May 13 12:43:21 PDT 2010
Hi,
in http://www.mail-archive.com/ocfs2-devel@oss.oracle.com/msg03188.html
(more than an year ago) I've reported a lock inversion between dlm->ast_lock
and res->spinlock. The deadlock seems to be still there in 2.6.34-rc7:
=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.34-rc7-xen #4
-------------------------------------------------------
dlm_thread/2001 is trying to acquire lock:
(&(&dlm->ast_lock)->rlock){+.+...}, at: [<ffffffffa0119785>] dlm_queue_bast+0x55/0x1e0 [ocfs2_dlm]
but task is already holding lock:
(&(&res->spinlock)->rlock){+.+...}, at: [<ffffffffa010452d>] dlm_thread+0x7cd/0x17f0 [ocfs2_dlm]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&(&res->spinlock)->rlock){+.+...}:
[<ffffffff810746bf>] __lock_acquire+0x109f/0x1720
[<ffffffff81074da9>] lock_acquire+0x69/0x90
[<ffffffff81328c6c>] _raw_spin_lock+0x2c/0x40
[<ffffffff8117e158>] _atomic_dec_and_lock+0x78/0xa0
[<ffffffffa010ebb9>] dlm_lockres_release_ast+0x29/0xb0 [ocfs2_dlm]
[<ffffffffa0104e41>] dlm_thread+0x10e1/0x17f0 [ocfs2_dlm]
[<ffffffff81060e1e>] kthread+0x8e/0xa0
[<ffffffff8100bda4>] kernel_thread_helper+0x4/0x10
-> #0 (&(&dlm->ast_lock)->rlock){+.+...}:
[<ffffffff81074b18>] __lock_acquire+0x14f8/0x1720
[<ffffffff81074da9>] lock_acquire+0x69/0x90
[<ffffffff81328c6c>] _raw_spin_lock+0x2c/0x40
[<ffffffffa0119785>] dlm_queue_bast+0x55/0x1e0 [ocfs2_dlm]
[<ffffffffa010494f>] dlm_thread+0xbef/0x17f0 [ocfs2_dlm]
[<ffffffff81060e1e>] kthread+0x8e/0xa0
[<ffffffff8100bda4>] kernel_thread_helper+0x4/0x10
other info that might help us debug this:
1 lock held by dlm_thread/2001:
#0: (&(&res->spinlock)->rlock){+.+...}, at: [<ffffffffa010452d>] dlm_thread+0x7cd/0x17f0 [ocfs2_dlm]
stack backtrace:
Pid: 2001, comm: dlm_thread Not tainted 2.6.34-rc7-xen #4
Call Trace:
[<ffffffff810723d0>] print_circular_bug+0xf0/0x100
[<ffffffff81074b18>] __lock_acquire+0x14f8/0x1720
[<ffffffff8100701d>] ? xen_force_evtchn_callback+0xd/0x10
[<ffffffff81074da9>] lock_acquire+0x69/0x90
[<ffffffffa0119785>] ? dlm_queue_bast+0x55/0x1e0 [ocfs2_dlm]
[<ffffffff81328c6c>] _raw_spin_lock+0x2c/0x40
[<ffffffffa0119785>] ? dlm_queue_bast+0x55/0x1e0 [ocfs2_dlm]
[<ffffffffa0119785>] dlm_queue_bast+0x55/0x1e0 [ocfs2_dlm]
[<ffffffffa010494f>] dlm_thread+0xbef/0x17f0 [ocfs2_dlm]
[<ffffffff81070cdd>] ? trace_hardirqs_off+0xd/0x10
[<ffffffff8107335d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff813293b2>] ? _raw_spin_unlock_irq+0x32/0x40
[<ffffffff81061330>] ? autoremove_wake_function+0x0/0x40
[<ffffffffa0103d60>] ? dlm_thread+0x0/0x17f0 [ocfs2_dlm]
[<ffffffff81060e1e>] kthread+0x8e/0xa0
[<ffffffff8100bda4>] kernel_thread_helper+0x4/0x10
[<ffffffff81329790>] ? restore_args+0x0/0x30
[<ffffffff8100bda0>] ? kernel_thread_helper+0x0/0x10
I'm now regularly hitting this problem so it stops me from verifying
whether there are other possible deadlocks in ocfs2 quota code...
Honza
--
Jan Kara <jack at suse.cz>
SUSE Labs, CR
More information about the Ocfs2-devel
mailing list