[Ocfs2-devel] [PATCH 3/3] o2net: correct keepalive message protocol

Sunil Mushran sunil.mushran at oracle.com
Wed Feb 17 15:50:25 PST 2010


My understanding was that we'll also requeue after sending a keepalive.
As in, not wait for the response to requeue. But we'll still be smart about
it in the sense that not send a hb even if the nodes are communicating
otherwise.

Srinivas Eeda wrote:
> In old code a node cancels and re queues keep alive message when it 
> hears from the other node. If it didn't hear in 2 seconds, queued 
> message gets fired which sends a keep alive message. And a re queue 
> happens only after it hears from the other node.
>
> With the new change, a node sends keep alive every 2 seconds.
>
> Sunil Mushran wrote:
>> How will it double? The node will send a keepalive only if it has
>> not heard from the other node for 2 secs.
>>
>> Srinivas Eeda wrote:
>>> No harm, just doubles heartbeat messages which is not required at all.
>>>
>>> Sunil Mushran wrote:
>>>> What's the harm in leaving it in?
>>>>
>>>> Srinivas Eeda wrote:
>>>>> Each node that has this patch would send a 
>>>>> O2NET_MSG_KEEP_REQ_MAGIC every 2 seconds(default). So, nodes 
>>>>> without this patch would always receive a heartbeat message every 
>>>>> 2 seconds.
>>>>>
>>>>> Nodes without this patch will send(respond) with 
>>>>> O2NET_MSG_KEEP_RESP_MAGIC for every keep alive packet they 
>>>>> received. So nodes with this patch will always receive a response 
>>>>> message.
>>>>>
>>>>> So, in a mixed setup, both nodes will always hear the heartbeat 
>>>>> from each other :).
>>>>>
>>>>> thanks,
>>>>> --Srini
>>>>>
>>>>>
>>>>>
>>>>> Joel Becker wrote:
>>>>>  
>>>>>> On Thu, Jan 28, 2010 at 08:51:11PM -0800, Srinivas Eeda wrote:
>>>>>>   
>>>>>>>          case O2NET_MSG_KEEP_REQ_MAGIC:
>>>>>>> -            o2net_sendpage(sc, o2net_keep_resp,
>>>>>>> -                       sizeof(*o2net_keep_resp));
>>>>>>> +            /* Each node now sends keepalive message every
>>>>>>> +             * keepalive time interval. Hence no need for response
>>>>>>> +             */
>>>>>>>              goto out;
>>>>>>>           
>>>>>>     You still have to send the response.  Think about a mixed
>>>>>> environment where some nodes have this fix and some do not.  The 
>>>>>> older
>>>>>> software is still waiting on the response.
>>>>>>     The newer version can just ignore any responses it gets from
>>>>>> other nodes.  But it has to send responses out just in case the 
>>>>>> other
>>>>>> node is older.
>>>>>>     The only other alternative is to bump the o2net protocol
>>>>>> version, and that means the cluster has to be shut down to 
>>>>>> upgrade.  Not
>>>>>> a good choice.
>>>>>>
>>>>>> Joel
>>>>>>
>>>>>>       
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Ocfs2-devel mailing list
>>>>> Ocfs2-devel at oss.oracle.com
>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>>>   
>>>>
>>>
>>
>




More information about the Ocfs2-devel mailing list