[Ocfs2-devel] [PATCH 3/3] o2net: correct keepalive message protocol

srinivas eeda srinivas.eeda at oracle.com
Wed Feb 17 16:59:33 PST 2010


Ok, I'll modify the patch. Are messages queued on o2net_wq and execution 
of o2net_process_message is always done in the context of o2net thread 
and are synchronized?

On 2/17/2010 3:50 PM, Sunil Mushran wrote:
> My understanding was that we'll also requeue after sending a keepalive.
> As in, not wait for the response to requeue. But we'll still be smart 
> about
> it in the sense that not send a hb even if the nodes are communicating
> otherwise.
>
> Srinivas Eeda wrote:
>> In old code a node cancels and re queues keep alive message when it 
>> hears from the other node. If it didn't hear in 2 seconds, queued 
>> message gets fired which sends a keep alive message. And a re queue 
>> happens only after it hears from the other node.
>>
>> With the new change, a node sends keep alive every 2 seconds.
>>
>> Sunil Mushran wrote:
>>> How will it double? The node will send a keepalive only if it has
>>> not heard from the other node for 2 secs.
>>>
>>> Srinivas Eeda wrote:
>>>> No harm, just doubles heartbeat messages which is not required at all.
>>>>
>>>> Sunil Mushran wrote:
>>>>> What's the harm in leaving it in?
>>>>>
>>>>> Srinivas Eeda wrote:
>>>>>> Each node that has this patch would send a 
>>>>>> O2NET_MSG_KEEP_REQ_MAGIC every 2 seconds(default). So, nodes 
>>>>>> without this patch would always receive a heartbeat message every 
>>>>>> 2 seconds.
>>>>>>
>>>>>> Nodes without this patch will send(respond) with 
>>>>>> O2NET_MSG_KEEP_RESP_MAGIC for every keep alive packet they 
>>>>>> received. So nodes with this patch will always receive a response 
>>>>>> message.
>>>>>>
>>>>>> So, in a mixed setup, both nodes will always hear the heartbeat 
>>>>>> from each other :).
>>>>>>
>>>>>> thanks,
>>>>>> --Srini
>>>>>>
>>>>>>
>>>>>>
>>>>>> Joel Becker wrote:
>>>>>>  
>>>>>>> On Thu, Jan 28, 2010 at 08:51:11PM -0800, Srinivas Eeda wrote:
>>>>>>>  
>>>>>>>>          case O2NET_MSG_KEEP_REQ_MAGIC:
>>>>>>>> -            o2net_sendpage(sc, o2net_keep_resp,
>>>>>>>> -                       sizeof(*o2net_keep_resp));
>>>>>>>> +            /* Each node now sends keepalive message every
>>>>>>>> +             * keepalive time interval. Hence no need for 
>>>>>>>> response
>>>>>>>> +             */
>>>>>>>>              goto out;
>>>>>>>>           
>>>>>>>     You still have to send the response.  Think about a mixed
>>>>>>> environment where some nodes have this fix and some do not.  The 
>>>>>>> older
>>>>>>> software is still waiting on the response.
>>>>>>>     The newer version can just ignore any responses it gets from
>>>>>>> other nodes.  But it has to send responses out just in case the 
>>>>>>> other
>>>>>>> node is older.
>>>>>>>     The only other alternative is to bump the o2net protocol
>>>>>>> version, and that means the cluster has to be shut down to 
>>>>>>> upgrade.  Not
>>>>>>> a good choice.
>>>>>>>
>>>>>>> Joel
>>>>>>>
>>>>>>>       
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Ocfs2-devel mailing list
>>>>>> Ocfs2-devel at oss.oracle.com
>>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>>>>   
>>>>>
>>>>
>>>
>>
>



More information about the Ocfs2-devel mailing list