[Ocfs2-devel] [PATCH 3/3] o2net: correct keepalive message protocol

Srinivas Eeda srinivas.eeda at oracle.com
Wed Feb 17 15:45:24 PST 2010


In old code a node cancels and re queues keep alive message when it 
hears from the other node. If it didn't hear in 2 seconds, queued 
message gets fired which sends a keep alive message. And a re queue 
happens only after it hears from the other node.

With the new change, a node sends keep alive every 2 seconds.

Sunil Mushran wrote:
> How will it double? The node will send a keepalive only if it has
> not heard from the other node for 2 secs.
>
> Srinivas Eeda wrote:
>> No harm, just doubles heartbeat messages which is not required at all.
>>
>> Sunil Mushran wrote:
>>> What's the harm in leaving it in?
>>>
>>> Srinivas Eeda wrote:
>>>> Each node that has this patch would send a O2NET_MSG_KEEP_REQ_MAGIC 
>>>> every 2 seconds(default). So, nodes without this patch would always 
>>>> receive a heartbeat message every 2 seconds.
>>>>
>>>> Nodes without this patch will send(respond) with 
>>>> O2NET_MSG_KEEP_RESP_MAGIC for every keep alive packet they 
>>>> received. So nodes with this patch will always receive a response 
>>>> message.
>>>>
>>>> So, in a mixed setup, both nodes will always hear the heartbeat 
>>>> from each other :).
>>>>
>>>> thanks,
>>>> --Srini
>>>>
>>>>
>>>>
>>>> Joel Becker wrote:
>>>>  
>>>>> On Thu, Jan 28, 2010 at 08:51:11PM -0800, Srinivas Eeda wrote:
>>>>>    
>>>>>>          case O2NET_MSG_KEEP_REQ_MAGIC:
>>>>>> -            o2net_sendpage(sc, o2net_keep_resp,
>>>>>> -                       sizeof(*o2net_keep_resp));
>>>>>> +            /* Each node now sends keepalive message every
>>>>>> +             * keepalive time interval. Hence no need for response
>>>>>> +             */
>>>>>>              goto out;
>>>>>>           
>>>>>     You still have to send the response.  Think about a mixed
>>>>> environment where some nodes have this fix and some do not.  The 
>>>>> older
>>>>> software is still waiting on the response.
>>>>>     The newer version can just ignore any responses it gets from
>>>>> other nodes.  But it has to send responses out just in case the other
>>>>> node is older.
>>>>>     The only other alternative is to bump the o2net protocol
>>>>> version, and that means the cluster has to be shut down to 
>>>>> upgrade.  Not
>>>>> a good choice.
>>>>>
>>>>> Joel
>>>>>
>>>>>       
>>>>
>>>>
>>>> _______________________________________________
>>>> Ocfs2-devel mailing list
>>>> Ocfs2-devel at oss.oracle.com
>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>>   
>>>
>>
>




More information about the Ocfs2-devel mailing list