[Ocfs2-devel] [PATCH v2 2/2] ocfs2: dlmfs: fix error handling of user_dlm_destroy_lock
Joseph Qi
joseph.qi at linux.alibaba.com
Thu May 19 01:59:54 UTC 2022
On 5/19/22 7:52 AM, Junxiao Bi wrote:
> When user_dlm_destroy_lock failed, it didn't clean up the flags it set
> before exit. For USER_LOCK_IN_TEARDOWN, if this function fails because
> of lock is still in used, next time when unlink invokes this function,
> it will return succeed, and then unlink will remove inode and dentry if
> lock is not in used(file closed), but the dlm lock is still linked in dlm
> lock resource, then when bast come in, it will trigger a panic due to
> user-after-free. See the following panic call trace.
> To fix this, USER_LOCK_IN_TEARDOWN should be reverted if fail. And also
> error should be returned if USER_LOCK_IN_TEARDOWN is set to let user know
> that unlink fail.
>
> For the case of ocfs2_dlm_unlock failure, besides USER_LOCK_IN_TEARDOWN,
> USER_LOCK_BUSY is also required to be cleared.
> Even though spin lock is released in between, but USER_LOCK_IN_TEARDOWN
> is still set, for USER_LOCK_BUSY, if before every place that waits on
> this flag, USER_LOCK_IN_TEARDOWN is checked to bail out, that will make
> sure no flow waits on the busy flag set by user_dlm_destroy_lock(),
> then we can simplely revert USER_LOCK_BUSY when ocfs2_dlm_unlock fails.
> Fix user_dlm_cluster_lock() which is the only function not following this.
>
> [ 941.336392] (python,26174,16):dlmfs_unlink:562 ERROR: unlink
> 004fb0000060000b5a90b8c847b72e1, error -16 from destroy
> [ 989.757536] ------------[ cut here ]------------
> [ 989.757709] kernel BUG at fs/ocfs2/dlmfs/userdlm.c:173!
> [ 989.757876] invalid opcode: 0000 [#1] SMP
> [ 989.758027] Modules linked in: ksplice_2zhuk2jr_ib_ipoib_new(O)
> ksplice_2zhuk2jr(O) mptctl mptbase xen_netback xen_blkback xen_gntalloc
> xen_gntdev xen_evtchn cdc_ether usbnet mii ocfs2 jbd2 rpcsec_gss_krb5
> auth_rpcgss nfsv4 nfsv3 nfs_acl nfs fscache lockd grace ocfs2_dlmfs
> ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs bnx2fc
> fcoe libfcoe libfc scsi_transport_fc sunrpc ipmi_devintf bridge stp llc
> rds_rdma rds bonding ib_sdp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad
> rdma_cm ib_cm iw_cm falcon_lsm_serviceable(PE) falcon_nf_netcontain(PE)
> mlx4_vnic falcon_kal(E) falcon_lsm_pinned_13402(E) mlx4_ib ib_sa ib_mad
> ib_core ib_addr xenfs xen_privcmd dm_multipath iTCO_wdt iTCO_vendor_support
> pcspkr sb_edac edac_core i2c_i801 lpc_ich mfd_core ipmi_ssif i2c_core ipmi_si
> ipmi_msghandler
> [ 989.760686] ioatdma sg ext3 jbd mbcache sd_mod ahci libahci ixgbe dca ptp
> pps_core vxlan udp_tunnel ip6_udp_tunnel megaraid_sas mlx4_core crc32c_intel
> be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi ipv6 cxgb3 mdio
> libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi wmi
> dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
> ksplice_2zhuk2jr_ib_ipoib_old]
> [ 989.761987] CPU: 10 PID: 19102 Comm: dlm_thread Tainted: P OE
> 4.1.12-124.57.1.el6uek.x86_64 #2
> [ 989.762290] Hardware name: Oracle Corporation ORACLE SERVER
> X5-2/ASM,MOTHERBOARD,1U, BIOS 30350100 06/17/2021
> [ 989.762599] task: ffff880178af6200 ti: ffff88017f7c8000 task.ti:
> ffff88017f7c8000
> [ 989.762848] RIP: e030:[<ffffffffc07d4316>] [<ffffffffc07d4316>]
> __user_dlm_queue_lockres.part.4+0x76/0x80 [ocfs2_dlmfs]
> [ 989.763185] RSP: e02b:ffff88017f7cbcb8 EFLAGS: 00010246
> [ 989.763353] RAX: 0000000000000000 RBX: ffff880174d48008 RCX:
> 0000000000000003
> [ 989.763565] RDX: 0000000000120012 RSI: 0000000000000003 RDI:
> ffff880174d48170
> [ 989.763778] RBP: ffff88017f7cbcc8 R08: ffff88021f4293b0 R09:
> 0000000000000000
> [ 989.763991] R10: ffff880179c8c000 R11: 0000000000000003 R12:
> ffff880174d48008
> [ 989.764204] R13: 0000000000000003 R14: ffff880179c8c000 R15:
> ffff88021db7a000
> [ 989.764422] FS: 0000000000000000(0000) GS:ffff880247480000(0000)
> knlGS:ffff880247480000
> [ 989.764685] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 989.764865] CR2: ffff8000007f6800 CR3: 0000000001ae0000 CR4:
> 0000000000042660
> [ 989.765081] Stack:
> [ 989.765167] 0000000000000003 ffff880174d48040 ffff88017f7cbd18
> ffffffffc07d455f
> [ 989.765442] ffff88017f7cbd88 ffffffff816fb639 ffff88017f7cbd38
> ffff8800361b5600
> [ 989.765717] ffff88021db7a000 ffff88021f429380 0000000000000003
> ffffffffc0453020
> [ 989.765991] Call Trace:
> [ 989.766093] [<ffffffffc07d455f>] user_bast+0x5f/0xf0 [ocfs2_dlmfs]
> [ 989.766287] [<ffffffff816fb639>] ? schedule_timeout+0x169/0x2d0
> [ 989.766475] [<ffffffffc0453020>] ? o2dlm_lock_ast_wrapper+0x20/0x20
> [ocfs2_stack_o2cb]
> [ 989.766738] [<ffffffffc045303a>] o2dlm_blocking_ast_wrapper+0x1a/0x20
> [ocfs2_stack_o2cb]
> [ 989.767010] [<ffffffffc0864ec6>] dlm_do_local_bast+0x46/0xe0 [ocfs2_dlm]
> [ 989.767217] [<ffffffffc084f5cc>] ? dlm_lockres_calc_usage+0x4c/0x60
> [ocfs2_dlm]
> [ 989.767466] [<ffffffffc08501f1>] dlm_thread+0xa31/0x1140 [ocfs2_dlm]
> [ 989.767662] [<ffffffff816f78da>] ? __schedule+0x24a/0x810
> [ 989.767834] [<ffffffff816f78ce>] ? __schedule+0x23e/0x810
> [ 989.768006] [<ffffffff816f78da>] ? __schedule+0x24a/0x810
> [ 989.768178] [<ffffffff816f78ce>] ? __schedule+0x23e/0x810
> [ 989.768349] [<ffffffff816f78da>] ? __schedule+0x24a/0x810
> [ 989.768521] [<ffffffff816f78ce>] ? __schedule+0x23e/0x810
> [ 989.768693] [<ffffffff816f78da>] ? __schedule+0x24a/0x810
> [ 989.768893] [<ffffffff816f78ce>] ? __schedule+0x23e/0x810
> [ 989.769067] [<ffffffff816f78da>] ? __schedule+0x24a/0x810
> [ 989.769241] [<ffffffff810ce4d0>] ? wait_woken+0x90/0x90
> [ 989.769411] [<ffffffffc084f7c0>] ? dlm_kick_thread+0x80/0x80 [ocfs2_dlm]
> [ 989.769617] [<ffffffff810a8bbb>] kthread+0xcb/0xf0
> [ 989.769774] [<ffffffff816f78da>] ? __schedule+0x24a/0x810
> [ 989.769945] [<ffffffff816f78da>] ? __schedule+0x24a/0x810
> [ 989.770117] [<ffffffff810a8af0>] ? kthread_create_on_node+0x180/0x180
> [ 989.770321] [<ffffffff816fdaa1>] ret_from_fork+0x61/0x90
> [ 989.770492] [<ffffffff810a8af0>] ? kthread_create_on_node+0x180/0x180
> [ 989.770689] Code: d0 00 00 00 f0 45 7d c0 bf 00 20 00 00 48 89 83 c0 00 00
> 00 48 89 83 c8 00 00 00 e8 55 c1 8c c0 83 4b 04 10 48 83 c4 08 5b 5d c3 <0f>
> 0b 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 55 41 54 53 48 83
> [ 989.771892] RIP [<ffffffffc07d4316>]
> __user_dlm_queue_lockres.part.4+0x76/0x80 [ocfs2_dlmfs]
> [ 989.772174] RSP <ffff88017f7cbcb8>
> [ 989.772704] ---[ end trace ebd1e38cebcc93a8 ]---
> [ 989.772907] Kernel panic - not syncing: Fatal exception
> [ 989.773173] Kernel Offset: disabled
>
> Cc: <stable at vger.kernel.org>
> Signed-off-by: Junxiao Bi <junxiao.bi at oracle.com>
Reviewed-by: Joseph Qi <joseph.qi at linux.alibaba.com>
> ---
> fs/ocfs2/dlmfs/userdlm.c | 16 +++++++++++++++-
> 1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ocfs2/dlmfs/userdlm.c b/fs/ocfs2/dlmfs/userdlm.c
> index af0be612589c..617c92e7b925 100644
> --- a/fs/ocfs2/dlmfs/userdlm.c
> +++ b/fs/ocfs2/dlmfs/userdlm.c
> @@ -433,6 +433,11 @@ int user_dlm_cluster_lock(struct user_lock_res *lockres,
> }
>
> spin_lock(&lockres->l_lock);
> + if (lockres->l_flags & USER_LOCK_IN_TEARDOWN) {
> + spin_unlock(&lockres->l_lock);
> + status = -EAGAIN;
> + goto bail;
> + }
>
> /* We only compare against the currently granted level
> * here. If the lock is blocked waiting on a downconvert,
> @@ -595,7 +600,7 @@ int user_dlm_destroy_lock(struct user_lock_res *lockres)
> spin_lock(&lockres->l_lock);
> if (lockres->l_flags & USER_LOCK_IN_TEARDOWN) {
> spin_unlock(&lockres->l_lock);
> - return 0;
> + goto bail;
> }
>
> lockres->l_flags |= USER_LOCK_IN_TEARDOWN;
> @@ -609,12 +614,17 @@ int user_dlm_destroy_lock(struct user_lock_res *lockres)
> }
>
> if (lockres->l_ro_holders || lockres->l_ex_holders) {
> + lockres->l_flags &= ~USER_LOCK_IN_TEARDOWN;
> spin_unlock(&lockres->l_lock);
> goto bail;
> }
>
> status = 0;
> if (!(lockres->l_flags & USER_LOCK_ATTACHED)) {
> + /*
> + * lock is never requested, leave USER_LOCK_IN_TEARDOWN set
> + * to avoid new lock request coming in.
> + */
> spin_unlock(&lockres->l_lock);
> goto bail;
> }
> @@ -624,6 +634,10 @@ int user_dlm_destroy_lock(struct user_lock_res *lockres)
>
> status = ocfs2_dlm_unlock(conn, &lockres->l_lksb, DLM_LKF_VALBLK);
> if (status) {
> + spin_lock(&lockres->l_lock);
> + lockres->l_flags &= ~USER_LOCK_IN_TEARDOWN;
> + lockres->l_flags &= ~USER_LOCK_BUSY;
> + spin_unlock(&lockres->l_lock);
> user_log_dlm_error("ocfs2_dlm_unlock", status, lockres);
> goto bail;
> }
More information about the Ocfs2-devel
mailing list