[Ocfs2-devel] [PATCH] ocfs2: fix a panic problem caused by o2cb_ctl

Jia Guo guojia12 at huawei.com
Fri Jan 25 17:49:31 PST 2019



On 2019/1/25 20:30, Joseph Qi wrote:
> 
> 
> On 19/1/25 14:05, Jia Guo wrote:
>> In the process of creating a node, it will cause NULL pointer reference
> 
> NULL pointer dereference
Fix it in patch v2.
> 
>> in kernel if o2cb_ctl failed in the interval
>> (mkdir, o2cb_set_node_attribute(node_num)] in function o2cb_add_node.
>>
>> The node num is initialized to 0 in function o2nm_node_group_make_item,
>> o2nm_node_group_drop_item will mistake the node number 0 for a
>> valid node number when we delete the node before the node number is set
>> correctly. If the local node number of the current host happens to be 0,
>> cluster->cl_local_node will be set to O2NM_INVALID_NODE_NUM while
>> o2hb_thread still running. The panic stack is generated as follows:
>>
>> o2hb_thread
>>     \-o2hb_do_disk_heartbeat
>>         \-o2hb_check_own_slot
>>             |-slot = &reg->hr_slots[o2nm_this_node()];
>>             //o2nm_this_node() return O2NM_INVALID_NODE_NUM
>>
>> We need to check whether the node number is set when we delete the node.
>>
>> Signed-off-by: Jia Guo <guojia12 at huawei.com>
>> ---
>>  fs/ocfs2/cluster/nodemanager.c | 13 +++++++++----
>>  1 file changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/ocfs2/cluster/nodemanager.c b/fs/ocfs2/cluster/nodemanager.c
>> index 0e4166c..992ccc0 100644
>> --- a/fs/ocfs2/cluster/nodemanager.c
>> +++ b/fs/ocfs2/cluster/nodemanager.c
>> @@ -620,10 +620,16 @@ static void o2nm_node_group_drop_item(struct config_group *group,
>>  {
>>  	struct o2nm_node *node = to_o2nm_node(item);
>>  	struct o2nm_cluster *cluster = to_o2nm_cluster(group->cg_item.ci_parent);
>> +	int nd_num_set = 0;
>>
>> -	o2net_disconnect_node(node);
>> +	/* nd_num might be 0 if the node number hasn't been set.. */
>> +	if (cluster->cl_nodes[node->nd_num] == node) {
>> +		nd_num_set = 1;
>> +
>> +	if (nd_num_set)
>> +		o2net_disconnect_node(node);
>>
>> -	if (cluster->cl_has_local &&
>> +	if (nd_num_set && cluster->cl_has_local &&
>>  	    (cluster->cl_local_node == node->nd_num)) {
>>  		cluster->cl_has_local = 0;
>>  		cluster->cl_local_node = O2NM_INVALID_NODE_NUM;
>> @@ -638,8 +644,7 @@ static void o2nm_node_group_drop_item(struct config_group *group,
>>  	if (node->nd_ipv4_address)
>>  		rb_erase(&node->nd_ip_node, &cluster->cl_node_ip_tree);
> 
> Why we have to this if node number hasn't been set?
> If this is also under the condition, we may rewrite this patch like:
> if (nd_num_set) {
> 	do_something();
> }
o2cb_ctl set attribute in the following order:
1、mkdir, create configfs item
2、set IP port attribute
3、set IP address attribute
4、set node number attribute
IP address is set before the node number, we need to do erase whether the node number
is set successfully or not.

This patch can still rewrite as you suggest, fix it in patch v2.

Thanks,
Jia
> 
> Thanks,
> Joseph
> 
>>
>> -	/* nd_num might be 0 if the node number hasn't been set.. */
>> -	if (cluster->cl_nodes[node->nd_num] == node) {
>> +	if (nd_num_set) {
>>  		cluster->cl_nodes[node->nd_num] = NULL;
>>  		clear_bit(node->nd_num, cluster->cl_nodes_bitmap);
>>  	}
>>
> .
> 



More information about the Ocfs2-devel mailing list