[Ocfs2-devel] [PATCH 1/6] ocfs2: o2hb: add negotiate timer

Junxiao Bi junxiao.bi at oracle.com
Thu Jan 21 19:23:33 PST 2016


Hi Andrew,

On 01/22/2016 07:42 AM, Andrew Morton wrote:
> On Wed, 20 Jan 2016 11:13:34 +0800 Junxiao Bi <junxiao.bi at oracle.com> wrote:
> 
>> When storage down, all nodes will fence self due to write timeout.
>> The negotiate timer is designed to avoid this, with it node will
>> wait until storage up again.
>>
>> Negotiate timer working in the following way:
>>
>> 1. The timer expires before write timeout timer, its timeout is half
>> of write timeout now. It is re-queued along with write timeout timer.
>> If expires, it will send NEGO_TIMEOUT message to master node(node with
>> lowest node number). This message does nothing but marks a bit in a
>> bitmap recording which nodes are negotiating timeout on master node.
>>
>> 2. If storage down, nodes will send this message to master node, then
>> when master node finds its bitmap including all online nodes, it sends
>> NEGO_APPROVL message to all nodes one by one, this message will re-queue
>> write timeout timer and negotiate timer.
>> For any node doesn't receive this message or meets some issue when
>> handling this message, it will be fenced.
>> If storage up at any time, o2hb_thread will run and re-queue all the
>> timer, nothing will be affected by these two steps.
>>
>> ...
>>
>> +static void o2hb_nego_timeout(struct work_struct *work)
>> +{
>> +	struct o2hb_region *reg =
>> +		container_of(work, struct o2hb_region,
>> +			     hr_nego_timeout_work.work);
> 
> It's better to just do
> 
> 	struct o2hb_region *reg;
> 
> 	reg = container_of(work, struct o2hb_region, hr_nego_timeout_work.work);
> 
> and avoid the weird 80-column tricks.
OK. Will update this in V2.

> 
>> +	unsigned long live_node_bitmap[BITS_TO_LONGS(O2NM_MAX_NODES)];
> 
> the bitmap.h interfaces might be nicer here.  Perhaps.  A little bit.
Will consider this in v2.

> 
>> +	int master_node;
>> +
>> +	o2hb_fill_node_map(live_node_bitmap, sizeof(live_node_bitmap));
>> +	/* lowest node as master node to make negotiate decision. */
>> +	master_node = find_next_bit(live_node_bitmap, O2NM_MAX_NODES, 0);
>> +
>> +	if (master_node == o2nm_this_node()) {
>> +		set_bit(master_node, reg->hr_nego_node_bitmap);
>> +		if (memcmp(reg->hr_nego_node_bitmap, live_node_bitmap,
>> +				sizeof(reg->hr_nego_node_bitmap))) {
>> +			/* check negotiate bitmap every second to do timeout
>> +			 * approve decision.
>> +			 */
>> +			schedule_delayed_work(&reg->hr_nego_timeout_work,
>> +				msecs_to_jiffies(1000));
> 
> One second is long enough to unmount the fs (and to run `rmmod
> ocfs2'!).  Is there anything preventing the work from triggering in
> these situations?
Yes, this delayed work will by sync before the umount.

Thanks,
Junxiao.
> 
>> +
>> +			return;
>> +		}
>> +
>> +		/* approve negotiate timeout request. */
>> +	} else {
>> +		/* negotiate timeout with master node. */
>> +	}
>> +
>>  }
> 




More information about the Ocfs2-devel mailing list