[Ocfs2-tools-devel] [PATCH RESEND 3/3] fsck.ocfs2: support append direct io incompat feature

Tue Mar 8 01:02:09 PST 2016

Hi Eric,
Thanks very much for reviewing. Please see my comments below.

On 2016/3/4 21:51, Eric Ren wrote:
> Hello Joseph,
> 
> I don't go through all relative code pathes, but has some questions
> that confuse me. Hope you could give some help.
> 
> On Thu, Mar 03, 2016 at 07:34:46PM +0800, Joseph Qi wrote: 
>> Support to truncate direct io orphan entry in fsck.ocfs2.
>>
>> Signed-off-by: Joseph Qi <joseph.qi at huawei.com>
>> ---
>>  fsck.ocfs2/pass4.c  |  9 +++++++++
>>  libocfs2/truncate.c | 12 ++++++++++++
>>  2 files changed, 21 insertions(+)
>>
>> diff --git a/fsck.ocfs2/pass4.c b/fsck.ocfs2/pass4.c
>> index 7737cc9..de3f672 100644
>> --- a/fsck.ocfs2/pass4.c
>> +++ b/fsck.ocfs2/pass4.c
>> @@ -100,6 +100,9 @@ out:
>>  	return;
>>  }
>>
>> +#define OCFS2_DIO_ORPHAN_PREFIX "dio-"
>> +#define OCFS2_DIO_ORPHAN_PREFIX_LEN 4
>> +
>>  static int replay_orphan_iterate(struct ocfs2_dir_entry *dirent,
>>  				 uint64_t blocknr,
>>  				 int	offset,
>> @@ -138,6 +141,11 @@ static int replay_orphan_iterate(struct ocfs2_dir_entry *dirent,
>>  		goto out;
>>  	}
>>
> Code in here is:
> 133         ret = ocfs2_truncate(ost->ost_fs, dirent->inode, 0);
> 134         if (ret) {
> 135                 com_err(whoami, ret, "while truncating orphan inode %"PRIu64,
> 136                         (uint64_t)dirent->inode);
> 137                 ret_flags |= OCFS2_DIRENT_ABORT;
> 138                 goto out;
> 139         }
> 
> Do we really want to truncate the data of crashed inode which was doing DIO
> instead of recovery? I read aboud your append O_DIRECT patch set. The idea looks
> like to add file to orphan dir when doing direc io, and delete from orphan dir after
> direct io is done.
> 
Yes, we have to. This is to keep the same behavior that kernel behaves.
In kernel module, once dio crashes, we will do truncate in orphan
recovery to reclaim allocated clusters.

>> +	/* do not delete inode in case of dio orphan entry */
>> +	if (!strncmp(dirent->name, OCFS2_DIO_ORPHAN_PREFIX,
>> +			OCFS2_DIO_ORPHAN_PREFIX_LEN))
>> +		goto out_check;
>> +
> 
> It make sense to skip _delte_inode() stuff. But I cannot understand very well what the code
> after out_check: is supposed to do? Looks it collects info about link count and inode refs,
> and then do check_link_counts() where inode would be moved to lost+found dir if no dirent
> refer to it. So is it possible to lose the inode unexpectedly?
> 
Normally it won't. Otherwise I think it is a bug for inode link count.

>>  	ret = ocfs2_delete_inode(ost->ost_fs, dirent->inode);
>>  	if (ret) {
>>  		com_err(whoami, ret, "while deleting orphan inode %"PRIu64
>> @@ -148,6 +156,7 @@ static int replay_orphan_iterate(struct ocfs2_dir_entry *dirent,
>>
>>  	ost->ost_orphan_deleted_count++;
>>
>> +out_check:
>>  	/* Only calculate icount in force check. */
>>  	if (ost->ost_force) {
>>  		/*
>> diff --git a/libocfs2/truncate.c b/libocfs2/truncate.c
>> index 7327253..451db28 100644
>> --- a/libocfs2/truncate.c
>> +++ b/libocfs2/truncate.c
>> @@ -33,6 +33,7 @@
>>  #include <assert.h>
>>  #include <errno.h>
>>  #include "ocfs2/ocfs2.h"
>> +#include "ocfs2/byteorder.h"
>>
>>  struct truncate_ctxt {
>>  	uint64_t ino;
>> @@ -356,6 +357,16 @@ errcode_t ocfs2_truncate_full(ocfs2_filesys *fs, uint64_t ino,
>>  	if (ret)
>>  		goto out;
>>
>> +	/* in case of dio crashed, force do trucate since blocks may already
>> +	 * be allocated
>> +	 */
>> +	if (ci->ci_inode->i_flags & cpu_to_le32(OCFS2_DIO_ORPHANED_FL)) {
>> +		ci->ci_inode->i_flags &= ~cpu_to_le32(OCFS2_DIO_ORPHANED_FL);
>> +		ci->ci_inode->i_dio_orphaned_slot = 0;
>> +		new_i_size = ci->ci_inode->i_size;
>> +		goto truncate;
>> +	}
>> +
> 
> Could you explain a bit more the reason why we should add this lines of code here?
> Because fix_dups_func() 
>           ->clone_one_inode() or delete_one_inode()
> 	    ->1355         ret = ocfs2_truncate_full(ost->ost_fs, ino, 0,
> 	      1356                                   pass1d_free_clusters, fd);
> 
> As you can see the 3th parameter(new_i_size) is 0, so it won't hit go-out. Why do we
> need this line:
> 	new_i_size = ci->ci_inode->i_size;
> ?
> 
Though replay_orphan_iterate->ocfs2_truncate passes new_i_size with 0,
but for dio entry, the new_i_size will be re-assigned to the current
inode i_size. As I explained before, this is to reclaim allocated
clusters.
In other words, clusters are allocated, but inode i_size has
not been updated and written to disk. So we have to make the inode go
back to the former status after fsck.

Thanks,
Joseph

> Thanks!
> Eric
>>  	if (ci->ci_inode->i_size == new_i_size)
>>  		goto out;
>>
>> @@ -364,6 +375,7 @@ errcode_t ocfs2_truncate_full(ocfs2_filesys *fs, uint64_t ino,
>>  		goto out;
>>  	}
>>
>> +truncate:
>>  	if ((S_ISLNK(ci->ci_inode->i_mode) && !ci->ci_inode->i_clusters) ||
>>  	    (ci->ci_inode->i_dyn_features & OCFS2_INLINE_DATA_FL))
>>  		ret = ocfs2_truncate_inline(fs, ino, new_i_size);
>> -- 
>> 1.8.4.3
>>
>>
>>
>> _______________________________________________
>> Ocfs2-tools-devel mailing list
>> Ocfs2-tools-devel at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-tools-devel
>>
> 
> .
>