[Ocfs2-users] Large Files Hang Server

Sunil Mushran sunil.mushran at oracle.com
Tue May 24 14:54:46 PDT 2011


Did you set the mount option on both nodes or only on the node
on which you were doing the ls?

Setting it on both nodes, or on the node that  is doing the cp should
solve the perf issue. What's happening is that the ls on node2 is forcing
node1 to journal commit. With the ordered data journal mode, the data
is flushed on commit. Switching to writeback will allow it to commit
without flushing the data.

On 05/24/2011 01:58 PM, Keith W wrote:
> Yes, I am finding that if I do the large file copy on node1 and
> do an ls -l on node1 it is very fast as expected.
>
> If I do the large file copy on node1 and do an ls -l on node2
> ls -l is showing multi second times. 5+ seconds at least.
>
> If I do a file listing on any other file it is fast regardless
> of which node I am on so long as I don't specify the file in transfer.
>
> Only the file in transfer will hand ls -l when not on the node doing
> the transfer.
>
> I am starting to think this is expected behaviour. Am I correct?
>
> +-------------------------------+
> +	      Keith		+
> +-------------------------------+
>
> On Tue, 24 May 2011, Sunil Mushran wrote:
>
>> Writeback will help if the writes are on one node and the ls on another.
>> It is not clear if that is the case or not.
>>
>> If both ops are on the same node, then it just could be the disk is slow.
>> The times shows almost all wall time. Very little sys and no user. top
>> will show io wait times.
>>
>> On 05/24/2011 11:45 AM, Keith W wrote:
>>> No change in behavior.
>>> My mount options
>>> /dev/sdj1   /u03    ocfs2   _netdev,noatime,data=writeback,nointr	0 0
>>>
>>> +-------------------------------+
>>> +	      Keith		+
>>> +-------------------------------+
>>>
>>> On Tue, 24 May 2011, Sunil Mushran wrote:
>>>
>>>> Repeat the same test but with volumes mounted with data=writeback
>>>> mount option.
>>>>
>>>> mount -o data=writeback /dev/sdX /path
>>>>
>>>> On 05/24/2011 07:11 AM, Keith W wrote:
>>>>> Hello list.
>>>>> Apologies in advance, this may be a bit long. Just trying to give
>>>>> as much info as I can at the outset.
>>>>>
>>>>> I have a two node setup that share a 500Gig SAS drive via ocfs2.
>>>>> When I move either large files 300Megs+ or a large number of smaller files
>>>>> onto or off of the volume, my terminal session will hang and if I do a
>>>>> directory listing in another terminal while doing a file transfer that
>>>>> terminal will hang as well.
>>>>>
>>>>> The only thing I can see that is not "typical" is that I had to change
>>>>> the port to 8888 due to another application running on 7777.
>>>>>
>>>>>
>>>>> Here is my configuration:
>>>>> ------------------------
>>>>> Oracle Enterprise Linux 5.5 (Oracle Updated Kernel 2.6.18-194.0.0.0.3.el5)
>>>>> OCFS2 Version 1.4.4
>>>>> GigE Interconnect
>>>>> SaS Connection to the drive.
>>>>>
>>>>>
>>>>> cluster.conf:
>>>>> -------------
>>>>> cluster:
>>>>>            node_count = 2
>>>>>            name = HobCluster
>>>>> node:
>>>>>            ip_port = 8888
>>>>>            ip_address = 192.168.0.1
>>>>>            number = 0
>>>>>            name = hoban1
>>>>>            cluster = HobCluster
>>>>> node:
>>>>>            ip_port = 8888
>>>>>            ip_address = 192.168.0.2
>>>>>            number = 1
>>>>>            name = hoban2
>>>>>            cluster = HobCluster
>>>>>
>>>>>
>>>>>
>>>>> /etc/sysconfig/o2cb:
>>>>> -------------------
>>>>> O2CB_ENABLED=true
>>>>> O2CB_STACK=o2cb
>>>>> O2CB_BOOTCLUSTER=HobCluster
>>>>> O2CB_HEARTBEAT_THRESHOLD=
>>>>> O2CB_IDLE_TIMEOUT_MS=
>>>>> O2CB_KEEPALIVE_DELAY_MS=
>>>>> O2CB_RECONNECT_DELAY_MS=
>>>>>
>>>>>
>>>>> Status:
>>>>> --------
>>>>> [root at hoban1 u03]# /etc/init.d/o2cb status
>>>>> Driver for "configfs": Loaded
>>>>> Filesystem "configfs": Mounted
>>>>> Driver for "ocfs2_dlmfs": Loaded
>>>>> Filesystem "ocfs2_dlmfs": Mounted
>>>>> Checking O2CB cluster HobCluster: Online
>>>>> Heartbeat dead threshold = 31
>>>>>      Network idle timeout: 30000
>>>>>      Network keepalive delay: 2000
>>>>>      Network reconnect delay: 2000
>>>>> Checking O2CB heartbeat: Active
>>>>>
>>>>>
>>>>>
>>>>> Additional Info:
>>>>> ---------------
>>>>> While transfering a large file I do an ls -l on any file within
>>>>> the /u03(ocfs2) directory it goes quickly as expected.
>>>>>
>>>>> [root at hoban2 u03]# time ls -l asdf
>>>>> -rw-r--r-- 1 root root 0 May 23 08:23 asdf
>>>>>
>>>>> real	0m0.003s
>>>>> user	0m0.000s
>>>>> sys	0m0.003s
>>>>>
>>>>>
>>>>> During a large file transfer, a ls -l on the file being transfered
>>>>> hangs for a very long time.
>>>>> [root at hoban2 u03]# time ls -l
>>>>> total 547340
>>>>> -rw-r--r-- 1 root   root             0 May 23 08:23 asdf
>>>>> -rw-r--r-- 1 root   root     560476160 May 24  2011 Enterprise-R5-U5-x86_64.iso
>>>>> drwxr-xr-x 2 root   root          3896 May 22 09:29 lost+found
>>>>> drwxr-xr-x 3 oracle oinstall      3896 May 23 14:32 oracle
>>>>>
>>>>> real	0m5.552s
>>>>> user	0m0.000s
>>>>> sys	0m0.004s
>>>>>
>>>>> Once the file has completed it's transfer, the ls works just fine and
>>>>> nothing hangs. On occasion both terminal sessions will lock and need to
>>>>> be killed, the file never completing it's transfer.
>>>>>
>>>>> Any suggestions are greatly appreciated.
>>>>>
>>>>> +-------------------------------+
>>>>> +	      Keith		+
>>>>> +-------------------------------+
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Ocfs2-users mailing list
>>>>> Ocfs2-users at oss.oracle.com
>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users




More information about the Ocfs2-users mailing list