[Ocfs2-devel] The root cause analysis about buffer read getting starvation
Gang He
ghe at suse.com
Mon Dec 21 01:12:57 PST 2015
Hello guys,
>>>
> On 12/21/2015 10:53 AM, Gang He wrote:
>> Hello Mark and all,
>>
>>
> [ snip ]
>
>>> >
>>> > You are correct here - the change was introduced to solve a deadlock between
>>> > page lock and ip_alloc_sem(). Basically, ->readpage is going to be called
>>> > with the page lock held and we need to be aware of that.
>> Hello guys, my main question is, why we changed ip_alloc_sem lock/unlock
> position from SLES10 to SLES11?
>> In SLES10, we get ip_alloc_sem lock before calling generic_file_read() or
> generic_file_write_nolock in file.c,
>> but in SLES11, we get ip_alloc_sem lock in ocfs2_readpage in aops.c, and
> more, getting/putting the page lock and ip_alloc_sem lock orders are NOT
> consistent in read/write path.
>> I just want to know the background behind this code evolution. If we keep
> getting the ip_alloc_sem lock before calling generic_file_aio_read in SLES11,
> the deadlock can be avoided?
>> then, we need not to use nonblocking way to get the lock in read_page(),
> buffer read will not getting starvation in such case, the read/write IO
> behavior will be the same with SLES10.
> Holding locks during generic_file_read() will stop reader and writer
> running parallel.
> For ip_alloc_sem, running parallel is bad as reader and writer may touch
> different pages.
> For inode_lock, looks acceptable, parallel running reader and writer
> will cause a lock ping-pang issue and keep truncating and flushing pages
> caches, this will cause bad performance. Of course, need fixing the
> recursive locking issue, or it will be very easy to run into deadlock.
The main concern is, why we do not use the related locks in read/write like SLES10 way?
since the customer run a program, which read/write the same file from different nodes,
the read behavior is not the same with SLES10, occasional long time read IO will let
the customer program can't run normally. I want to discuss if we can do something to fix this problem.
Otherwise, this read behavior in parallel read/write IO is considered as a by-design issue, we will tell the truth to the customer.
Here, the read code in SLES10 like,
static ssize_t ocfs2_file_read(struct file *filp,
char __user *buf,
size_t count,
loff_t *ppos)
{
...
ret = ocfs2_lock_buffer_inodes(&ctxt, NULL); <<= cluster meta/data lock
if (ret < 0) {
mlog_errno(ret);
goto bail_unlock;
}
down_read(&OCFS2_I(inode)->ip_alloc_sem); <<= get the lock before calling generic_file_read, then we need not to get any lock by non-block and retry way in the function, the read will become fair to get IO.
ret = generic_file_read(filp, buf, count, ppos);
up_read(&OCFS2_I(inode)->ip_alloc_sem);
...
}
But, the read code in SLES11 like,
there is not cluster meta lock in ocfs2_file_aio_read(), the locks are moved into readpage function.
static int ocfs2_readpage(struct file *file, struct page *page)
{
...
ret = ocfs2_inode_lock_with_page(inode, NULL, 0, page); <<= here, nonblock way to get the lock, it is not fair when there is parallel write from another node, next block way locking and retry (several hundreds times ) will cost a long time, this is why in the customer program occasional one read will cost too long time ( 1 - 15 secs).
if (ret != 0) {
if (ret == AOP_TRUNCATED_PAGE)
unlock = 0;
mlog_errno(ret);
goto out;
}
if (down_read_trylock(&oi->ip_alloc_sem) == 0) {
/*
* Unlock the page and cycle ip_alloc_sem so that we don't
* busyloop waiting for ip_alloc_sem to unlock
*/
ret = AOP_TRUNCATED_PAGE;
unlock_page(page);
unlock = 0;
down_read(&oi->ip_alloc_sem);
up_read(&oi->ip_alloc_sem);
goto out_inode_unlock;
}
...
}
Thanks
Gang
>
> Thanks,
> Junxiao.
>>
>> Thanks
>> Gang
>>
More information about the Ocfs2-devel
mailing list