[Ocfs2-devel] [PATCH] ocfs2: initialize ip_next_orphan
Wengang Wang
wen.gang.wang at oracle.com
Thu Oct 29 14:04:55 PDT 2020
Though problem if found on a lower 4.1.12 kernel, I think upstream
has same issue.
In one node in the cluster, there is the following callback trace:
# cat /proc/21473/stack
[<ffffffffc09a2f06>] __ocfs2_cluster_lock.isra.36+0x336/0x9e0 [ocfs2]
[<ffffffffc09a4481>] ocfs2_inode_lock_full_nested+0x121/0x520 [ocfs2]
[<ffffffffc09b2ce2>] ocfs2_evict_inode+0x152/0x820 [ocfs2]
[<ffffffff8122b36e>] evict+0xae/0x1a0
[<ffffffff8122bd26>] iput+0x1c6/0x230
[<ffffffffc09b60ed>] ocfs2_orphan_filldir+0x5d/0x100 [ocfs2]
[<ffffffffc0992ae0>] ocfs2_dir_foreach_blk+0x490/0x4f0 [ocfs2]
[<ffffffffc099a1e9>] ocfs2_dir_foreach+0x29/0x30 [ocfs2]
[<ffffffffc09b7716>] ocfs2_recover_orphans+0x1b6/0x9a0 [ocfs2]
[<ffffffffc09b9b4e>] ocfs2_complete_recovery+0x1de/0x5c0 [ocfs2]
[<ffffffff810a1399>] process_one_work+0x169/0x4a0
[<ffffffff810a1bcb>] worker_thread+0x5b/0x560
[<ffffffff810a7a2b>] kthread+0xcb/0xf0
[<ffffffff816f5d21>] ret_from_fork+0x61/0x90
[<ffffffffffffffff>] 0xffffffffffffffff
The above stack is not reasonable, the final iput shouldn't happen in
ocfs2_orphan_filldir() function. Looking at the code,
2067 /* Skip inodes which are already added to recover list, since dio may
2068 * happen concurrently with unlink/rename */
2069 if (OCFS2_I(iter)->ip_next_orphan) {
2070 iput(iter);
2071 return 0;
2072 }
2073
The logic thinks the inode is already in recover list on seeing
ip_next_orphan is non-NULL, so it skip this inode after dropping a
reference which incremented in ocfs2_iget().
While, if the inode is already in recover list, it should have another
reference and the iput() at line 2070 should not be the final iput
(dropping the last reference). So I don't think the inode is really
in the recover list (no vmcore to confirm).
Note that ocfs2_queue_orphans(), though not shown up in the call back trace,
is holding cluster lock on the orphan directory when looking up for unlinked
inodes. The on disk inode eviction could involve a lot of IOs which may need
long time to finish. That means this node could hold the cluster lock for
very long time, that can lead to the lock requests (from other nodes) to the
orhpan directory hang for long time.
Looking at more on ip_next_orphan, I found it's not initialized when
allocating a new ocfs2_inode_info structure.
Fix:
initialize ip_next_orphan as NULL.
Signed-off-by: Wengang Wang <wen.gang.wang at oracle.com>
---
fs/ocfs2/super.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index 1d91dd1e8711..6f0e07584a15 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -1724,6 +1724,8 @@ static void ocfs2_inode_init_once(void *data)
&ocfs2_inode_caching_ops);
inode_init_once(&oi->vfs_inode);
+
+ oi->ip_next_orphan = NULL;
}
static int ocfs2_initialize_mem_caches(void)
--
2.21.0
More information about the Ocfs2-devel
mailing list