[Ocfs2-devel] [PATCH] fs/jbd2: t_updates should increase when start_this_handle() failed in jbd2__journal_restart()

Fri Jun 21 16:26:17 PDT 2013

On Thu 20-06-13 14:12:15, Ted Tso wrote:
> On Thu, Jun 20, 2013 at 01:26:09PM -0400, Josef Bacik wrote:
> > I realize it's been a little bit since I've looked at jbd but I'll offer my
> > opinion.  Callers of jbd2_journal_restart() may not be the ones who originated
> > the handle, so doing what Jan has done with jbd2_journal_start_reserved() isn't
> > going to work because all the guy at the top is going to see is an error and
> > have no way to tell if his handle is invalid or not.
> 
> Yeah, that's what I meant by "it would require changing all of the
> callers".
> 
> > What I would suggest is getting a unified way to mark that the handle has
> > already been cleaned up and can just be free'd.
> 
> The problem though is we need to make sure none of the callers don't
> try to do anything else with handle besides calling
> jbd2_journal_stop().  In particular, we can't allow a call to
> jbd2_journal_get_write_access(), jbd2_journal_revoke() to operate on
> the handle, because its transaction pointer is (potentially) invalid.
  I think this is the cleanest solution going forward as well.

> > Then fix jbd2_journal_start_reserved() and jbd2_journal_restart() to
> > set that in the handle and make jbd2_journal_stop() just free up the
> > handle and reset current->journal_info but not return an error.
> > It's important to not return an error from jbd2_journal_stop() so
> > that it doesn't invoke the ext4 error handling stuff and you get a
> > read only file system when the error may not be read only file
> > system worthy.
> 
> The handle->h_aborted bit, which is currently not used, does most of
> the right thing, modulo the question of the fact that
> jbd2_journal_stop() will return an error.  What's important from my
> perspective is that the various callers that operate on a handle check
> is_handle_aborted() before trying to use the it.  We'll still need to
> audit the callers to make sure there isn't some uncommon-taken code
> path where ext4_handle_dirty_metadata() gets called after
> ext4_journal_restart() has returned an error.
  Yeah, I don't see a solution where we could avoid the audit... Somewhat
comforting is that if the change to error handling will break something it
has been broken previously as well, only the bug was better hidden :)

> As a FAST paper once opined, "EIO: Error Handling Is Occasionally correct".  :-)
> 
> > This way you have a nice clean way of dealing with handle errors that allow you
> > to pass back a real error to the caller and the caller can just do its normal
> > jbd2_journal_stop() and cleanup and do its own error handling the way it feels.
> > This keeps the yucky details of no longer valid handles all internal to jbd2 and
> > ext4/ocfs2 don't have to worry about it.  Thanks,
> 
> Yes, that could work, although we'll need to check to make sure all of
> the code paths that invoke jbd2_journal_restart() handle errors
> appropriately, and don't rely on jbd2_journal_stop() returning an
> error.  Thanks for your thoughts!

								Honza
-- 
Jan Kara <jack at suse.cz>
SUSE Labs, CR