[Ocfs2-devel] Buffer read will get starvation in case reading/writing the same file from different nodes concurrently
Gang He
ghe at suse.com
Mon Dec 7 20:51:04 PST 2015
Hello Jeseph,
>>>
> Hi Gang,
> Eric and I have discussed this case before.
> Using NONBLOCK here is because there is a lock inversion between inode
> lock and page lock. You can refer to the comments of
> ocfs2_inode_lock_with_page for details.
> Actually I have found that NONBLOCK mode is only used in lock inversion
> cases.
Could you tell more details? for a inode object, here are three type cluster locks,
ip_open_lockres is used to protect open/delete posix semantic,
ip_rw_lockres looks to be used to protect write in direct-io case, just feel the words "rw" let people be confused,
ip_inode_lockres to be used more widely, could you tell more, why there needs a lock inversion, not use block-way to get lock in read/write code path?
second, when the read path can not get the lock with the NONBLOCK mode, why need to call ocfs2_inode_lock(inode, ret_bh, ex) with a block way in the next code?
actually, two lock wrappers use the same cluster lock (ip_inode_lockres), you know, the next block-way locking will cost too much time.
Thanks a lot.
Gang
>
> Thanks,
> Joseph
>
> On 2015/12/8 11:21, Gang He wrote:
>> Hello Guys,
>>
>> There is a issue from the customer, who is complaining that buffer reading
> sometimes lasts too much time ( 1 - 10 seconds) in case reading/writing the
> same file from different nodes concurrently.
>> According to the demo code from the customer, we also can reproduce this
> issue at home (run the test program under SLES11SP4 OCFS2 cluster), actually
> this issue can be reproduced in openSuSe 13.2 (more newer code), but in
> direct-io mode, this issue will disappear.
>> Base on my investigation, the root cause is the buffer-io using cluster-lock
> is different from direct-io, I do not know why buffer-io use cluster-lock like
> this way.
>> the code details are as below,
>> in aops.c file,
>> 281 static int ocfs2_readpage(struct file *file, struct page *page)
>> 282 {
>> 283 struct inode *inode = page->mapping->host;
>> 284 struct ocfs2_inode_info *oi = OCFS2_I(inode);
>> 285 loff_t start = (loff_t)page->index << PAGE_CACHE_SHIFT;
>> 286 int ret, unlock = 1;
>> 287
>> 288 trace_ocfs2_readpage((unsigned long long)oi->ip_blkno,
>> 289 (page ? page->index : 0));
>> 290
>> 291 ret = ocfs2_inode_lock_with_page(inode, NULL, 0, page); <<==
> this line
>> 292 if (ret != 0) {
>> 293 if (ret == AOP_TRUNCATED_PAGE)
>> 294 unlock = 0;
>> 295 mlog_errno(ret);
>> 296 goto out;
>> 297 }
>>
>> in dlmglue.c file,
>> 2 int ocfs2_inode_lock_with_page(struct inode *inode,
>> 2443 struct buffer_head **ret_bh,
>> 2444 int ex,
>> 2445 struct page *page)
>> 2446 {
>> 2447 int ret;
>> 2448
>> 2449 ret = ocfs2_inode_lock_full(inode, ret_bh, ex,
> OCFS2_LOCK_NONBLOCK); <<== there, why using NONBLOCK mode to get the cluster
> lock? this way will let reading IO get starvation.
>> 2450 if (ret == -EAGAIN) {
>> 2451 unlock_page(page);
>> 2452 if (ocfs2_inode_lock(inode, ret_bh, ex) == 0)
>> 2453 ocfs2_inode_unlock(inode, ex);
>> 2454 ret = AOP_TRUNCATED_PAGE;
>> 2455 }
>> 2456
>> 2457 return ret;
>> 2458 }
>>
>> If you know the background behind the code, please tell us, why not use
> block way to get the lock in reading a page, then reading IO will get the
> page fairly when there is a concurrent writing IO from the other node.
>> Second, I tried to modify that line from OCFS2_LOCK_NONBLOCK to 0 (switch to
> blocking way), the reading IO will not be blocked too much time (can erase
> the customer's complaining), but a new problem arises, sometimes the reading
> IO and writing IO get a dead lock (why dead lock? I am looking at).
>>
>>
>> Thanks
>> Gang
>>
>>
>>
>> .
>>
More information about the Ocfs2-devel
mailing list