[Ocfs2-test-devel] [PATCH 2/9] ocfs2-test: buildkernel - Converted from LAM/MPI to OpenMPI.

tristan.ye tristan.ye at oracle.com
Tue Feb 17 17:54:37 PST 2009


On Tue, 2009-02-17 at 14:38 -0800, Marcos Matsunaga wrote:
> Signed-off-by: Marcos Matsunaga <Marcos.Matsunaga at oracle.com>
> ---
>  programs/buildkernel/run_buildkernel.py |   31 +++++++++++++++++++++----------
>  1 files changed, 21 insertions(+), 10 deletions(-)
> 
> diff --git a/programs/buildkernel/run_buildkernel.py b/programs/buildkernel/run_buildkernel.py
> index ba73405..351f1e1 100755
> --- a/programs/buildkernel/run_buildkernel.py
> +++ b/programs/buildkernel/run_buildkernel.py
> @@ -84,22 +84,26 @@ def Initialize():
>  	'Initialize the directories (remove and extract)'
>  #
>  	o2tf.printlog('Cleaning up directories.', logfile, 0, '')
> -	o2tf.StartMPI(DEBUGON, options.nodelist, logfile)
> -	o2tf.lamexec(DEBUGON, nproc, config.WAIT, str('%s -c -d %s -l %s' % \
> +	o2tf.OpenMPIInit(DEBUGON, options.nodelist, logfile, 'ssh')
> +	o2tf.openmpi_run(DEBUGON, nproc, str('%s -c -d %s -l %s' % \
>  			(buildcmd, 
>  			options.dirlist, 
>  			options.logfile) ),
>  			options.nodelist, 
> -			options.logfile )
> +			'ssh',
> +			options.logfile,
> +			'WAIT')
>  #
>  	o2tf.printlog('Extracting tar file into directories.', logfile, 0, '')
> -	o2tf.lamexec(DEBUGON, nproc, config.WAIT, str('%s -e -d %s -l %s -t %s' % \
> +	o2tf.openmpi_run(DEBUGON, nproc, str('%s -e -d %s -l %s -t %s' % \
>  			(buildcmd, 
>  			options.dirlist, 
>  			options.logfile,
>  			tarfile) ),
>  			options.nodelist, 
> -			options.logfile )
> +			'ssh',
> +			options.logfile,
> +			'WAIT')
>  	o2tf.printlog('Directories initialization completed.', logfile, 0, '')
>  #
>  Usage = 'Usage: %prog [-c|--count count] \
> @@ -205,13 +209,20 @@ for i in range(options.count):
>  	r = i+1
>  	o2tf.printlog('run_buildkernel: Starting RUN# %s of %s' % (r, options.count),
>  		logfile, 3, '=')
> -	o2tf.StartMPI(DEBUGON, options.nodelist, logfile)
> -	o2tf.lamexec(DEBUGON, nproc, config.WAIT, str('%s -d %s -l %s -n %s' % \
> +	o2tf.OpenMPIInit(DEBUGON, options.nodelist, logfile, 'ssh')
> +	ret = o2tf.openmpi_run(DEBUGON, nproc, str('%s -d %s -l %s -n %s' % \
>  			(buildcmd, 
>  			options.dirlist, 
>  			options.logfile, 
>  			options.nodelist) ), 
>  			options.nodelist, 
> -			options.logfile )
> -o2tf.printlog('run_buildkernel: Test completed successfully.',
> -	logfile, 3, '=')
> +			'ssh',
> +			options.logfile,
> +			'WAIT' )
> +if not ret:
> +	o2tf.printlog('run_buildkernel: main - execution successful.',
> +		logfile, 0, '')
> +else:
> +	o2tf.printlog('run_buildkernel: main - execution failed.',
> +		logfile, 0, '')
> +sys.exit(ret)

Sometimes, we need such python launchers above to be invoked in shell
script(e.g mutiple_run.sh), in that case, if the 'ret' we upcasting in
buildkernel.py here is 256, then we unfortunately will get a ZERO return
code instead of 256 in any shell caller. just a wild guess, the shell
may treat the return code($?) as one byte, so it accepts the value in
range [0-255].

Actually, all python launchers which use os.spawn() to run openmpi
binary are quite subject to get a 256 return code when the mpirun
failed, and that will really fool our bash caller.

Therefore, I suggest you not directly upcast the return code by
'sys.exit(ret)', you can simply do as follows instead.

if not ret:
	o2tf.printlog('run_buildkernel: main - execution successful.',logfile,
0, '')
else:
	sys.exit(1) #it's enough to mark failure here.


All other following patches also need to concern such problem.






More information about the Ocfs2-test-devel mailing list