[Ocfs2-devel] [PATCH] ocfs2: o2net: fix connect expired

Srinivas Eeda srinivas.eeda at oracle.com
Wed Oct 29 22:41:58 PDT 2014


Hi Junxiao,

thanks for explaining. For this case allowing a reconnect (setting 
"atomic_set(&nn->nn_timeout, 1);" ) in o2net_connect_expired should work ?

Thanks,
--Srini


On 10/29/2014 10:32 PM, Junxiao Bi wrote:
> Hi Srini,
>
> On 10/30/2014 01:16 PM, Srinivas Eeda wrote:
>> Junxiao,
>>
>> can you please describe under what circumstances you saw this problem?
>> My understanding is o2net_connect_expired is only queued when connection
>> actually broke and ENOTCONN is the right error in that case.
> This happened when o2net was issuing the first connect request to some
> node, but the request packet is lost due to some error like network
> broken, then the connect would be expired, in o2net_connect_expired()
> that set nn->nn_persistent_error to -ENOTCONN but timeout was zero, so
> o2net_start_connect() would return without sending another connect
> request, connection to the node will never be built.
>
> Thanks,
> Junxiao.
>
>> Thanks,
>> --Srini
>>
>> On 10/29/2014 06:41 PM, Junxiao Bi wrote:
>>> Set nn_persistent_error to -ENOTCONN will stop reconnect since the
>>> "stop" condition in o2net_start_connect() will be true.
>>>
>>> stop = (nn->nn_sc ||
>>>          (nn->nn_persistent_error &&
>>>          (nn->nn_persistent_error != -ENOTCONN || timeout == 0)));
>>>
>>> This will make connection never be established if the first connection
>>> request
>>> is lost.
>>>
>>> Signed-off-by: Junxiao Bi <junxiao.bi at oracle.com>
>>> ---
>>>    fs/ocfs2/cluster/tcp.c |    2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/fs/ocfs2/cluster/tcp.c b/fs/ocfs2/cluster/tcp.c
>>> index 97de0fb..4d6b645 100644
>>> --- a/fs/ocfs2/cluster/tcp.c
>>> +++ b/fs/ocfs2/cluster/tcp.c
>>> @@ -1736,7 +1736,7 @@ static void o2net_connect_expired(struct
>>> work_struct *work)
>>>                 o2net_idle_timeout() / 1000,
>>>                 o2net_idle_timeout() % 1000);
>>>    -        o2net_set_nn_state(nn, NULL, 0, -ENOTCONN);
>>> +        o2net_set_nn_state(nn, NULL, 0, 0);
>>>        }
>>>        spin_unlock(&nn->nn_lock);
>>>    }




More information about the Ocfs2-devel mailing list