[Ocfs2-devel] [PATCH] ocfs2/cluster: unlock the o2hb_live_lock before the o2nm_depend_item()
Changwei Ge
ge.changwei at h3c.com
Wed Nov 1 01:28:53 PDT 2017
Hi Alex,
On 2017/11/1 15:05, alex chen wrote:
> Hi Joseph and Changwei,
>
> It's our basic principle that the function in which may sleep can't be called
> within spinlock hold.
I suppose this principle is a suggestion not a restriction.
>
> On 2017/11/1 9:03, Joseph Qi wrote:
>> Hi Alex,
>>
>> On 17/10/31 20:41, alex chen wrote:
>>> In the following situation, the down_write() will be called under
>>> the spin_lock(), which may lead a soft lockup:
>>> o2hb_region_inc_user
>>> spin_lock(&o2hb_live_lock)
>>> o2hb_region_pin
>>> o2nm_depend_item
>>> configfs_depend_item
>>> inode_lock
>>> down_write
>>> -->here may sleep and reschedule
>>>
>>> So we should unlock the o2hb_live_lock before the o2nm_depend_item(), and
>>> get item reference in advance to prevent the region to be released.
>>>
>>> Signed-off-by: Alex Chen <alex.chen at huawei.com>
>>> Reviewed-by: Yiwen Jiang <jiangyiwen at huawei.com>
>>> Reviewed-by: Jun Piao <piaojun at huawei.com>
>>> ---
>>> fs/ocfs2/cluster/heartbeat.c | 8 ++++++++
>>> 1 file changed, 8 insertions(+)
>>>
>>> diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
>>> index d020604..f1142a9 100644
>>> --- a/fs/ocfs2/cluster/heartbeat.c
>>> +++ b/fs/ocfs2/cluster/heartbeat.c
>>> @@ -2399,6 +2399,9 @@ static int o2hb_region_pin(const char *region_uuid)
>>> if (reg->hr_item_pinned || reg->hr_item_dropped)
>>> goto skip_pin;
>>>
>>> + config_item_get(®->hr_item);
>>> + spin_unlock(&o2hb_live_lock);
>>> +
>> If unlock here, the iteration of o2hb_all_regions is no longer safe.
>>
>> Thanks,
>> Joseph
>>
>
> In local heartbeat mode, here we already found the region and will break the loop after
> depending item, we get the item reference before spin_unlock(), that means the region will
> never be released by the o2hb_region_release() until we put the item reference after
> spin_lock(&o2hb_live_lock), so we can safely iterate over the list.
> In global heartbeat mode, it doesn't matter that some regions may be deleted after
> spin_unlock(), because we just pin all the active regions.
But we are still iterating elements on global list *o2hb_all_regions*,
right? How can we guarantee that no more elements will be attached or
detached while the lock is unlocked.
>
> Thanks,
> Alex
>
>>> /* Ignore ENOENT only for local hb (userdlm domain) */
>>> ret = o2nm_depend_item(®->hr_item);
>>> if (!ret) {
>>> @@ -2410,9 +2413,14 @@ static int o2hb_region_pin(const char *region_uuid)
>>> else {
>>> mlog(ML_ERROR, "Pin region %s fails with %d\n",
>>> uuid, ret);
>>> + config_item_put(®->hr_item);
>>> + spin_lock(&o2hb_live_lock);
>>> break;
>>> }
>>> }
>>> +
>>> + config_item_put(®->hr_item);
>>> + spin_lock(&o2hb_live_lock);
>>> skip_pin:
>>> if (found)
>>> break;
>>> -- 1.9.5.msysgit.1
>>>
>>>
>>
>> .
>>
>
>
More information about the Ocfs2-devel
mailing list