[Ocfs2-devel] [PATCH 3/3] o2net: correct keepalive message protocol
Sunil Mushran
sunil.mushran at oracle.com
Wed Feb 17 15:50:25 PST 2010
My understanding was that we'll also requeue after sending a keepalive.
As in, not wait for the response to requeue. But we'll still be smart about
it in the sense that not send a hb even if the nodes are communicating
otherwise.
Srinivas Eeda wrote:
> In old code a node cancels and re queues keep alive message when it
> hears from the other node. If it didn't hear in 2 seconds, queued
> message gets fired which sends a keep alive message. And a re queue
> happens only after it hears from the other node.
>
> With the new change, a node sends keep alive every 2 seconds.
>
> Sunil Mushran wrote:
>> How will it double? The node will send a keepalive only if it has
>> not heard from the other node for 2 secs.
>>
>> Srinivas Eeda wrote:
>>> No harm, just doubles heartbeat messages which is not required at all.
>>>
>>> Sunil Mushran wrote:
>>>> What's the harm in leaving it in?
>>>>
>>>> Srinivas Eeda wrote:
>>>>> Each node that has this patch would send a
>>>>> O2NET_MSG_KEEP_REQ_MAGIC every 2 seconds(default). So, nodes
>>>>> without this patch would always receive a heartbeat message every
>>>>> 2 seconds.
>>>>>
>>>>> Nodes without this patch will send(respond) with
>>>>> O2NET_MSG_KEEP_RESP_MAGIC for every keep alive packet they
>>>>> received. So nodes with this patch will always receive a response
>>>>> message.
>>>>>
>>>>> So, in a mixed setup, both nodes will always hear the heartbeat
>>>>> from each other :).
>>>>>
>>>>> thanks,
>>>>> --Srini
>>>>>
>>>>>
>>>>>
>>>>> Joel Becker wrote:
>>>>>
>>>>>> On Thu, Jan 28, 2010 at 08:51:11PM -0800, Srinivas Eeda wrote:
>>>>>>
>>>>>>> case O2NET_MSG_KEEP_REQ_MAGIC:
>>>>>>> - o2net_sendpage(sc, o2net_keep_resp,
>>>>>>> - sizeof(*o2net_keep_resp));
>>>>>>> + /* Each node now sends keepalive message every
>>>>>>> + * keepalive time interval. Hence no need for response
>>>>>>> + */
>>>>>>> goto out;
>>>>>>>
>>>>>> You still have to send the response. Think about a mixed
>>>>>> environment where some nodes have this fix and some do not. The
>>>>>> older
>>>>>> software is still waiting on the response.
>>>>>> The newer version can just ignore any responses it gets from
>>>>>> other nodes. But it has to send responses out just in case the
>>>>>> other
>>>>>> node is older.
>>>>>> The only other alternative is to bump the o2net protocol
>>>>>> version, and that means the cluster has to be shut down to
>>>>>> upgrade. Not
>>>>>> a good choice.
>>>>>>
>>>>>> Joel
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Ocfs2-devel mailing list
>>>>> Ocfs2-devel at oss.oracle.com
>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>>>
>>>>
>>>
>>
>
More information about the Ocfs2-devel
mailing list