[Ocfs2-devel] [PATCH 1/1] Ocfs2: Teach 'coherency=full' O_DIRECT writes to correctly up_read i_alloc_sem.

Tristan Ye tristan.ye at oracle.com
Sun Nov 21 18:22:23 PST 2010


Tao Ma wrote:
> Hi Tristan,
>     Just add joel to the cc in case he has a different option.
>
> On 11/19/2010 04:38 PM, Tristan Ye wrote:
>> Former logic of ocfs2_file_aio_write() was a bit stricky to unlock 
>> the rw_lock
>> and i_alloc_sem, by using some private bits in struct 'iocb' to 
>> communite with
>> ocfs2_dio_end_io(), it did work before we introduce the patch of 
>> supporting
>> 'coherency=full,buffered' option, since rw_lock and i_alloc_sem were 
>> never
>> acquired both at the same time, no mattar we doing buffered or direct 
>> IO or not.
> These 2 locks can be acquired at the same time.
> So if we go with direct_io, we do have i_alloc_sem and rw_lock locked 
> simultaneously. why do you get this?

For coherency_full direct_io, we have these 2 locks, while for 
coherency_buffered direct_io, we only acquire i_alloc_sem.

>
> I have gone through your patch and the bug. It sees to me that the 
> real cause for the bug is that you have EX rw_lock because of 
> full_coherency while locking i_alloc_sem. So finally in 
> ocfs2_dio_end_io, only rw_lock is freed and i_alloc_sem is left, 
> right? If yes, please update the above commit log for it.

Didn't quite get you here, there is no lock was blocking i_alloc_sem, 
instead, i_alloc_sem was not up_read() correctly and explicitly somewhere.

>
>
> I don't like your solution either. full_coherency is only used in 
> direct write and ocfs2_dio_end_io is used for both direct read/write. 
> So why add the complexity of coherency to ocfs2_dio_end_io? Also you 
> long comment in ocfs2_file_aio_write does indicate that it is really 
> hard for the code reader to learn why we need to set this flag.

The complexity is just introduce by the nature that 'coherency_full' and 
'coherency_buffered' direct_io writes is gonna have different locks, as 
you known, we only have one mode for direct_io writes before.

>
> My suggestion is: why not use another flag to indicate the state of 
> i_alloc_sem instead of full_coherency? So in place we down_read the 
> i_alloc_sem, set the flag accordingly, and in ocfs2_dio_end_io, just 
> check this flag instead of !rw_locked_level to up_read it. It should 
> be more straightforward. Agree?

Yep, I do agree that this fix looks tricky a bit, while the all existed 
ocfs2_dio_end_io() things were already tricky there;)

Using 'ocfs2_iocb_set_sem_locked' or 'ocfs2_iocb_set_coherency' didn't 
simplify the logic quite a bit, however, I still appreciated
your suggestion to follow existing convention, such as 
ocfs2_iocb_set_rw_locked' things, they just look more similar and in a 
series;-)

Tristan

>
> Joel, any comments?
>
> Regards,
> Tao
>
>
>>
>> This patch tries to teach ocfs2_dio_end_io fully understand the 
>> bahavior of
>> all writes, including 
>> buffered/concurrency-allowed-odirect/none-concurrency-odirect
>> writes, to have all lock/sem primitives getting correctly released.
>>
>> Signed-off-by: Tristan Ye<tristan.ye at oracle.com>
>> ---
>>   fs/ocfs2/aops.c |    9 +++++++--
>>   fs/ocfs2/aops.h |    6 ++++++
>>   fs/ocfs2/file.c |   16 ++++++++++++++++
>>   3 files changed, 29 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>> index f1e962c..fd0713c 100644
>> --- a/fs/ocfs2/aops.c
>> +++ b/fs/ocfs2/aops.c
>> @@ -568,7 +568,7 @@ static void ocfs2_dio_end_io(struct kiocb *iocb,
>>                    bool is_async)
>>   {
>>       struct inode *inode = iocb->ki_filp->f_path.dentry->d_inode;
>> -    int level;
>> +    int level, coherency;
>>
>>       /* this io's submitter should not have unlocked this before we 
>> could */
>>       BUG_ON(!ocfs2_iocb_is_rw_locked(iocb));
>> @@ -576,7 +576,12 @@ static void ocfs2_dio_end_io(struct kiocb *iocb,
>>       ocfs2_iocb_clear_rw_locked(iocb);
>>
>>       level = ocfs2_iocb_rw_locked_level(iocb);
>> -    if (!level)
>> +    /*
>> +     * 'coherency=full' O_DIRECT writes needs this extra bit
>> +     * to correctly up_read the i_alloc_sem.
>> +     */
>> +    coherency = ocfs2_iocb_coherency(iocb);
>> +    if ((!level) || coherency)
>>           up_read(&inode->i_alloc_sem);
>>       ocfs2_rw_unlock(inode, level);
>>
>> diff --git a/fs/ocfs2/aops.h b/fs/ocfs2/aops.h
>> index 76bfdfd..213cec6 100644
>> --- a/fs/ocfs2/aops.h
>> +++ b/fs/ocfs2/aops.h
>> @@ -72,4 +72,10 @@ static inline void ocfs2_iocb_set_rw_locked(struct 
>> kiocb *iocb, int level)
>>       clear_bit(0, (unsigned long *)&iocb->private)
>>   #define ocfs2_iocb_rw_locked_level(iocb) \
>>       test_bit(1, (unsigned long *)&iocb->private)
>> +#define ocfs2_iocb_set_coherency(iocb) \
>> +    set_bit(2, (unsigned long *)&iocb->private)
>> +#define ocfs2_iocb_clear_coherency(iocb) \
>> +    clear_bit(2, (unsigned long *)&iocb->private)
>> +#define ocfs2_iocb_coherency(iocb) \
>> +    test_bit(2, (unsigned long *)&iocb->private)
>>   #endif /* OCFS2_FILE_H */
>> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
>> index 77b4c04..df070a3 100644
>> --- a/fs/ocfs2/file.c
>> +++ b/fs/ocfs2/file.c
>> @@ -2277,8 +2277,24 @@ relock:
>>           }
>>
>>           ocfs2_inode_unlock(inode, 1);
>> +
>> +        /*
>> +         * Due to the fault of 'full_coherency' O_DIRECT
>> +         * write needs to acqure both i_alloc_sem and rw_lock.
>> +         * We do another trick here to have coherency bit
>> +         * stored in iocb to communicate with ocfs2_dio_end_io
>> +         * for properly unlocking i_alloc_sem.
>> +         */
>> +        ocfs2_iocb_set_coherency(iocb);
>>       }
>>
>> +    /*
>> +     * Concurrent-allowed odirect writes was able to up_read 
>> i_alloc_sem
>> +     * correctly, we therefore don't need this extra and tricky bit.
>> +     */
>> +    if (direct_io&&  !full_coherency)
>> +        ocfs2_iocb_clear_coherency(iocb);
>> +
>>       can_do_direct = direct_io;
>>       ret = ocfs2_prepare_inode_for_write(file, ppos,
>>                           iocb->ki_left, appending,




More information about the Ocfs2-devel mailing list