[Ocfs2-devel] [RFC] ocfs2/dlm: support range lock

Wengang Wang wen.gang.wang at oracle.com
Wed Jan 28 19:21:09 PST 2015


在 2015年01月29日 08:05, Goldwyn Rodrigues 写道:
> Hi Yangwenfang,
>
> I appreciate the effort in this regard.
>
> On 01/26/2015 06:28 AM, yangwenfang wrote:
>> What:
>> Byte range lock is applied to lock a region of a file to accelerate
>> reading/writing concurrently.
>>
>> Why:		
>> Currently ocfs2 does not support byte range lock. Since multiple nodes
>> may concurrently update/write at different positions of the same file
>> in database workloads, the performance(tpmc) of DB+ocfs2 is much poorer than
>> DB+GPFS in running TPCC.
>> Aiming at improving the efficiency of parallel accesses to the same file,
>> we have implemented a demo of range lock feature which has been supported
>> by lustre and GPFS, so that a file can be updated by different nodes in
>> the cluster when they are visiting different blocks.
>>
>> How:
>> Key issues in design and implementation:
>> 1.In ocfs2, each file only has one lock, which is incapable of telling
>> different position.
>> One solution is to add a range field (start,end) in a lock. For example:
>> -ocfs2_lock_res(N1)	      dlm_lock_resource(Master)	ocfs2_lock_res(N2)
>> -ocfs2_res_range_lock (0,9)----dlm_lock(0,9)    N1			
>> -				dlm_lock(10,19)  N2<--ocfs2_res_range_lock(10,19)
>> -ocfs2_res_range_lock (20,29)---dlm_lock(20,29)  N1			
>> -				dlm_lock(30,49)  N2<--ocfs2_res_range_lock(30,49)
>> -ocfs2_res_range_lock (50,59)---dlm_lock(50,59)  N1			
>> -				dlm_lock(60,69)  N2<--ocfs2_res_range_lock(60,69)
>>
>> Each lock resource deploys an interval tree to manage the range, which
>> supports basic operations like add, delete, insert, find, split and merge.
>> The most important issue is to determine the existance of conflicts
>> among the ranges. Conflict-free ranges of the same file can be accessed
>> concurrently. In the contrary, nodes must wait for the release of a
>> conflicted lock before accessing the range of file.
>>
>> Byte range lock supports split and merge rules: for same level, larger
>> scope; different level, write > read(If a node keeps EX lock with
>> range(start,end), then it has PR range lock(start,end)).
>> For example:
>> (1) merge: N1 keeps range lock (0,9)PR and (5,19)PR, the lock is merged into
>> (0,19) PR;
>> (2) merge: N1 keeps range lock (0,9)PR and (5,19)EX, the merged lock should
>> become(0,19) PR, (5,19)EX;
>> (3) split: N1 keeps range lock (0,9)PR, N2 tries to lock(0,5) PR, N1 should
>> split the lock and keep (6,9)PR.
> What is the purpose of doing this kind of merge/split? I assume this
> will be required in case of multiple processes from the same node
> read/write to the file. Would it not be simpler to not merge or split
> and keep separate instances in lock resources? This way you would have
> to do relatively lesser book keeping with respect to comparisons.
>
> Are these numbers in your pseudocode byte ranges? If yes, how do you
> propose multiple writes which lie within a block_size/cluster_size range?
>

Yes, if the range lock is used for file read/write, the granularity 
would be block rather than byte.
Say for example block size is 512, a write to 0-9 would acquire whole 
0~511 bytes to be locked. Or acquire 0~0 block to be locked. Otherwise 
If two write requests would access to same block, say one writes to 
0~254 and the other writes to 255~511, if they take 0~254 and 255~511 
respectively, the contents in this block may get corrupted after the two 
writes.

thanks,
wengang

>> 2.In ocfs2, there are only three types of lock resources: rw, inode and open
>> which provide protections to different contents.
>> We need to add another lock resource(ip_range_lock_lockres) to protect
>> different ranges in IO read/write process.
>> For example: buffer read/write.
>> (1)ocfs2_file_aio_write	------------->ocfs2_file_aio_write
>> 	ocfs2_rw_lock(ex)		ocfs2_rw_lock(pr)
>> 					ocfs2_range_lock(start, end, ex)
> This does not seem right. ocfs2_rw_lock is meant to serialize writes to
> the same file. Changing it from ex to pr would make the file
> inconsistent for writes to the same file. As Srini proposed, why create
> a new lock instead of adding the feature to rw_lock?
>
>> 	ocfs2_write_begin
>> 		ocfs2_inode_lock(ex)    ocfs2_inode_lock(pr)
>> 					if append, update to ex;
>> (2)ocfs2_file_aio_read---------------> no need to change.
>> 	ocfs2_readpage
>> 		ocfs2_inode_lock(pr)
>> (3)but it is a problem in read_ahead.
>> 	ocfs2_readpages------------------>ocfs2_readpages
>> 	ocfs2_inode_lock(pr)		ocfs2_inode_lock(pr)
>> 					ocfs2_range_lock(start, end, pr)
>> 																	
>> Limitations based on our assumption:
>> 1.Byte range lock is only beneficial for update write.
>> 2.Too many locks because of delayed unlock.
>> 3.Significant source code modification is necessitated, involving almost the
>> whole dlmglue and dlm modules.
>>
>> As described above, there are also many limitations base on our assumption.
>> Many thanks for any advice.
>>
>




More information about the Ocfs2-devel mailing list