[Ocfs2-users] Large Files Hang Server

Keith W keith at cydonia.net
Tue May 24 13:58:02 PDT 2011


Yes, I am finding that if I do the large file copy on node1 and
do an ls -l on node1 it is very fast as expected.

If I do the large file copy on node1 and do an ls -l on node2
ls -l is showing multi second times. 5+ seconds at least.

If I do a file listing on any other file it is fast regardless
of which node I am on so long as I don't specify the file in transfer. 

Only the file in transfer will hand ls -l when not on the node doing 
the transfer.

I am starting to think this is expected behaviour. Am I correct?

+-------------------------------+
+	      Keith		+
+-------------------------------+

On Tue, 24 May 2011, Sunil Mushran wrote:

> Writeback will help if the writes are on one node and the ls on another.
> It is not clear if that is the case or not.
> 
> If both ops are on the same node, then it just could be the disk is slow.
> The times shows almost all wall time. Very little sys and no user. top
> will show io wait times.
> 
> On 05/24/2011 11:45 AM, Keith W wrote:
> > No change in behavior.
> > My mount options
> > /dev/sdj1   /u03    ocfs2   _netdev,noatime,data=writeback,nointr	0 0
> >
> > +-------------------------------+
> > +	      Keith		+
> > +-------------------------------+
> >
> > On Tue, 24 May 2011, Sunil Mushran wrote:
> >
> >> Repeat the same test but with volumes mounted with data=writeback
> >> mount option.
> >>
> >> mount -o data=writeback /dev/sdX /path
> >>
> >> On 05/24/2011 07:11 AM, Keith W wrote:
> >>> Hello list.
> >>> Apologies in advance, this may be a bit long. Just trying to give
> >>> as much info as I can at the outset.
> >>>
> >>> I have a two node setup that share a 500Gig SAS drive via ocfs2.
> >>> When I move either large files 300Megs+ or a large number of smaller files
> >>> onto or off of the volume, my terminal session will hang and if I do a
> >>> directory listing in another terminal while doing a file transfer that
> >>> terminal will hang as well.
> >>>
> >>> The only thing I can see that is not "typical" is that I had to change
> >>> the port to 8888 due to another application running on 7777.
> >>>
> >>>
> >>> Here is my configuration:
> >>> ------------------------
> >>> Oracle Enterprise Linux 5.5 (Oracle Updated Kernel 2.6.18-194.0.0.0.3.el5)
> >>> OCFS2 Version 1.4.4
> >>> GigE Interconnect
> >>> SaS Connection to the drive.
> >>>
> >>>
> >>> cluster.conf:
> >>> -------------
> >>> cluster:
> >>>           node_count = 2
> >>>           name = HobCluster
> >>> node:
> >>>           ip_port = 8888
> >>>           ip_address = 192.168.0.1
> >>>           number = 0
> >>>           name = hoban1
> >>>           cluster = HobCluster
> >>> node:
> >>>           ip_port = 8888
> >>>           ip_address = 192.168.0.2
> >>>           number = 1
> >>>           name = hoban2
> >>>           cluster = HobCluster
> >>>
> >>>
> >>>
> >>> /etc/sysconfig/o2cb:
> >>> -------------------
> >>> O2CB_ENABLED=true
> >>> O2CB_STACK=o2cb
> >>> O2CB_BOOTCLUSTER=HobCluster
> >>> O2CB_HEARTBEAT_THRESHOLD=
> >>> O2CB_IDLE_TIMEOUT_MS=
> >>> O2CB_KEEPALIVE_DELAY_MS=
> >>> O2CB_RECONNECT_DELAY_MS=
> >>>
> >>>
> >>> Status:
> >>> --------
> >>> [root at hoban1 u03]# /etc/init.d/o2cb status
> >>> Driver for "configfs": Loaded
> >>> Filesystem "configfs": Mounted
> >>> Driver for "ocfs2_dlmfs": Loaded
> >>> Filesystem "ocfs2_dlmfs": Mounted
> >>> Checking O2CB cluster HobCluster: Online
> >>> Heartbeat dead threshold = 31
> >>>     Network idle timeout: 30000
> >>>     Network keepalive delay: 2000
> >>>     Network reconnect delay: 2000
> >>> Checking O2CB heartbeat: Active
> >>>
> >>>
> >>>
> >>> Additional Info:
> >>> ---------------
> >>> While transfering a large file I do an ls -l on any file within
> >>> the /u03(ocfs2) directory it goes quickly as expected.
> >>>
> >>> [root at hoban2 u03]# time ls -l asdf
> >>> -rw-r--r-- 1 root root 0 May 23 08:23 asdf
> >>>
> >>> real	0m0.003s
> >>> user	0m0.000s
> >>> sys	0m0.003s
> >>>
> >>>
> >>> During a large file transfer, a ls -l on the file being transfered
> >>> hangs for a very long time.
> >>> [root at hoban2 u03]# time ls -l
> >>> total 547340
> >>> -rw-r--r-- 1 root   root             0 May 23 08:23 asdf
> >>> -rw-r--r-- 1 root   root     560476160 May 24  2011 Enterprise-R5-U5-x86_64.iso
> >>> drwxr-xr-x 2 root   root          3896 May 22 09:29 lost+found
> >>> drwxr-xr-x 3 oracle oinstall      3896 May 23 14:32 oracle
> >>>
> >>> real	0m5.552s
> >>> user	0m0.000s
> >>> sys	0m0.004s
> >>>
> >>> Once the file has completed it's transfer, the ls works just fine and
> >>> nothing hangs. On occasion both terminal sessions will lock and need to
> >>> be killed, the file never completing it's transfer.
> >>>
> >>> Any suggestions are greatly appreciated.
> >>>
> >>> +-------------------------------+
> >>> +	      Keith		+
> >>> +-------------------------------+
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Ocfs2-users mailing list
> >>> Ocfs2-users at oss.oracle.com
> >>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
> 




More information about the Ocfs2-users mailing list