[Ocfs2-devel] Long io response time doubt

Sun Nov 15 17:40:01 PST 2015

Hi Eric,

On 2015/11/14 13:23, Eric Ren wrote:
> Hi Joseph,
> 
>>>> > >> 2. ocfs2cmt does periodically commit.
>>>> > >>
>>>> > >> One case can lead to long time downconvert is, it is indeed that it has
>>>> > >> too much work to do. I am not sure if there are any other cases or code
>>>> > >> bug.
>>> > > OK, not familiar with ocfs2cmt. Could I bother you to explain what ocfs2cmt is used to do,
>>> > > it's relation with R/W, and why down-conversion can be triggered by when it commits?
>> > Sorry, the above explanation is not right and may mislead you.
>> > 
>> > jbd2/xxx (previously called kjournald2?) does periodically commit,
>> > the default interval is 5s and can be set with mount option "commit=".
>> > 
>> > ocfs2cmt does the checkpoint, it can be waked up:
>> > a) unblock lock during downconvert, and if jbd2/xxx has already done the
>> > commit, ocfs2cmt won't be actually waken up because it has already been
>> > checkpointed. So ocfs2cmt works with jbd2/xxx.
> OK, thanks for your knowledge;-)
>> > b) evict inode and then do downconvert.
> Sorry, I'm confused about b). You mean b) is also part of ocfs2cmt's
> work? Does b) have something to do with a)? And what's the meaning of "evict inode"?
> Actually, I can hardly understand the idea of b).
You can go through the code flow:
iput->iput_final->evict->evict_inode->ocfs2_evict_inode
->ocfs2_clear_inode->ocfs2_checkpoint_inode->ocfs2_start_checkpoint

It happens that one node do not use the inode any longer (but not
delete), and will free its related lockres.

Thanks,
Joseph

>> > 
>>>>> > >>> Could you describes more in this case?
>>>>>> > >>>> And it seemed reasonable because it had to.
>>>>>> > >>>>
>>>>>> > >>>> Node 1 wrote file, and node 2 read it. Since you used buffer io, that
>>>>>> > >>>> was after node 1 had finished written, it might be still in page cache.
>>>>> > >>> Sorry, I cannot understand the relationship between "still in page case" and "so...downconvert".
>>>>>> > >>>> So node 1 should downconvert first then node 2 read could continue.
>>>>>> > >>>> That was why you said it seemed ocfs2_inode_lock_with_page spent most
>>>>> > >>> Actually, it suprises me more with such long time spent than the *most* time compared to "readpage" stuff ;-)
>>>>>> > >>>> time. More specifically, it was ocfs2_inode_lock after trying nonblock
>>>>>> > >>>> lock and returning -EAGAIN.
>>>>> > >>> You mean read process would repeatedly try nonblock lock until write process down-convertion completes?
>>>> > >> No, after nonblock lock returning -EAGAIN, it will unlock page and then
>>>> > >> call ocfs2_inode_lock and ocfs2_inode_unlock. And ocfs2_inode_lock will
>>> > > Yes.
>>>> > >> wait until downconvert completion in another node.
>>> > > Another node which read or write process on?
>> > Yes, the node blocks my request.
>> > For example, node 1 has EX, then node 2 wants to get PR, it should wait
>> > for node 1 downconvert first.
> OK~
> 
> Thanks,
> Eric