[Ocfs2-devel] The root cause analysis about buffer read getting starvation

Mon Dec 21 03:23:35 PST 2015

Hello Mark,

...snip..
> > SLES10 with kernel version about 2.6.16.x, used blocking way, i.e. down_read(), wich has the
> > potential deaklock between page lock / ip_alloc_sem when one node get the cluster lock and
> > does writing and reading on same file on it. This deadlock was fixed by this commit:
> 
> You are correct here - the change was introduced to solve a deadlock between
> page lock and ip_alloc_sem(). Basically, ->readpage is going to be called
> with the page lock held and we need to be aware of that.
...snip..
> > But somehow with this patch, performance in the scenario become very bad. I don't how this could happen? because the reading node just has only one
> > thread reading the shared file, then down_read_trylock() should always get ip_alloc_sem successfully, right? if not, who else may race ip_alloc_sem?
> 
> Hmm, there's only one thread and it can't get the lock? Any chance you might
No, it can always get the lock in this case. Sorry, I made a false testing
result. There're probably mainly two factors:

1. none-isolated testing environment - include nodes, network and shared disk;
2. testing program from customer - sleep for 1s after finishing ~1M read/write each time,
   thus the overlap time of read/write on two nodes is random; so the shoter overlap time is,
   the better performance looks.

Sorry again for bothering your time.
--Eric
> put some debug prints around where we acquire ip_alloc_sem? It would be
> interesting to see where it get taken to prevent this from happening.
> 	--Mark
> 
> --
> Mark Fasheh
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>