[Ocfs2-devel] [patch 1/1] ocfs2-timeout-protocol.patch

Thu Nov 30 04:24:29 PST 2006

On Nov 30, 2006, at 12:31 AM, Mark Fasheh wrote:

> Hi Andrew,
> 	Things are looking much better, but there's still a few issues that
> I found while reviewing the patch. I got Zach to look at it too  
> (he's the
> original author of the ocfs2 network code) which has generated some  
> good
> comments.
>
> On Wed, Nov 29, 2006 at 09:51:31AM +0100, abeekhof at suse.de wrote:
>>
>> From: Andrew Beekhof <abeekhof at suse.de>
>> Subject: [patch 1/1] OCFS2 Configurable timeouts - Protocol changes
>>
>> Modify the OCFS2 handshake to ensure essential timeouts are  
>> configured
>>   identically on all nodes.
>> Only allow changes when there are no connected peers
>> Improves the logic in o2net_advance_rx() which broke now that
>>   sizeof(struct o2net_handshake) is greater than sizeof(struct  
>> o2net_msg)
>> Included is the field for userspace-heartbeat timeout to avoid the  
>> need for
>>   further protocol changes.
>> Uses a global spinlock to ensure the decisions to update configfs  
>> entries
>>   are made on the correct value.  The region covered by the  
>> spinlock when
>>   incrimenting the counter is much larger as this is the more  
>> critical case.
> Nitpick: Can you format that commit log to be a bit more in line with
> standard kernel commits (the indenting is weird)

sure

>
>
>> Index: fs/ocfs2/cluster/nodemanager.c
>> ===================================================================
>> --- fs/ocfs2/cluster/nodemanager.c.orig	2006-11-20  
>> 16:25:58.000000000 +0100
>> +++ fs/ocfs2/cluster/nodemanager.c	2006-11-27 09:57:56.000000000  
>> +0100
>> @@ -558,15 +558,14 @@ static ssize_t o2nm_cluster_attr_write(c
>>  	return count;
>>  }
>>
>> -static ssize_t o2nm_cluster_attr_idle_timeout_ms_read(struct  
>> o2nm_cluster *cluster,
>> -                                                 char *page)
>> +static ssize_t o2nm_cluster_attr_idle_timeout_ms_read(
>> +	struct o2nm_cluster *cluster, char *page)
>>  {
>>  	return sprintf(page, "%u\n", cluster->cl_idle_timeout_ms);
>>  }
> Can you not re-write the function prototypes unless they're actually
> changing please? It clutters up the patch and makes it harder to  
> find the
> actual code to check (see below).

ah, bad habit i picked up working on smaller projects.
is it ok in a separate patch?  or have I got my wrap points set too  
small by default?

>
>
>> @@ -574,10 +573,22 @@ static ssize_t o2nm_cluster_attr_idle_ti
>>  	ret =  o2nm_cluster_attr_write(page, count, &val);
>>
>>  	if (ret > 0) {
>> +		if (cluster->cl_idle_timeout_ms != val) {
>> +			spin_lock(&connected_lock);
>> +			if(o2net_num_connected_peers()) {
>> +				mlog(ML_NOTICE,
>> +				     "o2net: cannot change idle timeout after "
>> +				     "the first peer has agreed to it."
>> +				     "  %d connected peers\n",
>> +				     o2net_num_connected_peers());
>> +				ret = -EINVAL;
>> +			}
>> +			spin_unlock(&connected_lock);
>> +		}
>>  		if (val <= cluster->cl_keepalive_delay_ms) {
>>  			mlog(ML_NOTICE, "o2net: idle timeout must be larger "
>>  			     "than keepalive delay\n");
>> -			return -EINVAL;
>> +			ret = -EINVAL;
>>  		}
>>  		cluster->cl_idle_timeout_ms = val;
> I don't know how I missed this before, but you're erroring with a  
> negative return
> value, yet continuing with the work of setting cluster- 
> >cl_idle_timeout_ms
> anyway. I think we're missing some goto's here and in the similar  
> blocks
> below.

my bad - fixed

>
>> @@ -1121,6 +1121,44 @@ static int o2net_check_handshake(struct
>>  		return -1;
>>  	}
>>
>> +	/*
>> +	 * Ensure timeouts are consistent with other nodes, otherwise
>> +	 * we can end up with one node thinking that the other must be  
>> down,
>> +	 * but isn't. This can ultimately cause corruption.
>> +	 */
>> +	if (be32_to_cpu(hand->o2net_idle_timeout_ms) !=
>> +				o2net_idle_timeout(sc->sc_node)) {
>> +		mlog(ML_NOTICE, SC_NODEF_FMT " uses a network idle timeout of "
>> +		     "%u ms, but we use %u ms locally.  disconnecting\n",
>> +		     SC_NODEF_ARGS(sc),
>> +		     be32_to_cpu(hand->o2net_idle_timeout_ms),
>> +		     o2net_idle_timeout(sc->sc_node));
>> +		o2net_ensure_shutdown(nn, sc, -ENOTCONN);
>> +		return -1;
>> +	}
>> +
>> +	if (be32_to_cpu(hand->o2net_keepalive_delay_ms) !=
>> +			o2net_keepalive_delay(sc->sc_node)) {
>> +		mlog(ML_NOTICE, SC_NODEF_FMT " uses a keepalive delay of "
>> +		     "%u ms, but we use %u ms locally.  disconnecting\n",
>> +		     SC_NODEF_ARGS(sc),
>> +		     be32_to_cpu(hand->o2net_keepalive_delay_ms),
>> +		     o2net_keepalive_delay(sc->sc_node));
>> +		o2net_ensure_shutdown(nn, sc, -ENOTCONN);
>> +		return -1;
>> +	}
>> +
>> +	if (be32_to_cpu(hand->o2hb_heartbeat_timeout_ms) !=
>> +			O2HB_MAX_WRITE_TIMEOUT_MS) {
>> +		mlog(ML_NOTICE, SC_NODEF_FMT " uses a heartbeat timeout of "
>> +		     "%u ms, but we use %u ms locally.  disconnecting\n",
>> +		     SC_NODEF_ARGS(sc),
>> +		     be32_to_cpu(hand->o2net_keepalive_delay_ms),
> We check hearbeat timeout here, but print keepalive delay...
>

fixed

>
>> @@ -1153,6 +1191,26 @@ static int o2net_advance_rx(struct o2net
>>  	sclog(sc, "receiving\n");
>>  	do_gettimeofday(&sc->sc_tv_advance_start);
>>
>> +	if(unlikely(sc->sc_handshake_ok == 0)) {
>> +		if(sc->sc_page_off < sizeof(struct o2net_handshake)) {
>> +			data = page_address(sc->sc_page) + sc->sc_page_off;
>> +			datalen = sizeof(struct o2net_handshake) - sc->sc_page_off;
>> +			ret = o2net_recv_tcp_msg(sc->sc_sock, data, datalen);
>> +			if (ret > 0)
>> +				sc->sc_page_off += ret;
>> +		}
>> +
>> +		if (sc->sc_page_off == sizeof(struct o2net_handshake)) {
>> +			o2net_check_handshake(sc);
>> +			if(sc->sc_handshake_ok == 0) {
>> +				BUG_ON(sizeof(struct o2net_handshake)
>> +				       == sizeof(struct o2net_msg));
> Is this necessary?

I wasnt sure at the time - see below - so i wanted to make sure it at  
least died sanely
apparently i still needed education on how that is done :)

> Didn't we fix the logic such that the relative sizes
> don't matter any more? If it _is_ necessary, then it should be a
> BUILD_BUG_ON() in a more visible place,

ah, I was not familiar with that macro yet

> with a nice fat comment explaining
> why...
>
>> +				ret = -EPROTO;
>> +			}
>> +			goto out;
> Do you mean to move that goto within the
>
> if (sc->sc_handshake_ok == 0) {
>
> block? I _think_ it's ok for us to continue otherwise...

i did - but if we never want to process an o2net_msg if the handshake  
has not been completed, then i can structure things a little  
differently/clearly

>
>
>> @@ -1178,8 +1227,7 @@ static int o2net_advance_rx(struct o2net
>>  				    O2NET_MAX_PAYLOAD_BYTES)
>>  					ret = -EOVERFLOW;
>>  			}
>> -		}
>> -		if (ret <= 0)
>> +		} else
>>  			goto out;
>>  	}
> Why are you doing that? We'll continue now if we want to return - 
> EOVERFLOW
> where we would error out before.

damn - i should have noticed that

I'll resubmit once we sort out the  o2net_msg / o2net_handshake  
situation

--
Andrew Beekhof

"Would the last person to leave please turn out the enlightenment?" -  
TISM