[Ocfs2-users] ocfs2 file system hang during copy files

Sunil Mushran Sunil.Mushran at oracle.com
Thu Jul 19 10:46:33 PDT 2007


The default disk heartbeat timeouts are way too low. In short, the
buffered write flush is probably flooding the device and delaying
the heartbeat io.

For more, refer:
http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_faq.html#HEARTBEAT

If you are 1.2.5, then also refer:
http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_faq.html#TIMEOUT

Zosen Wang wrote:
>
> I am trying to copy a single 42 gb file from et3 file system to ocfs2 
> file system on node 1. The ocfs2 file system hang on all nodes 
> after/during the cp. The /p0ebsdb/u13 is an ocfs2 mount point shared 
> with other 2 nodes (3 nodes rac).
>
>  
>
> The following is unix copy command
>
> [root at b30svrxp-ebsdb1 migrate]# time cp aexp02.dmp /p0ebsdb/u13/junk
>
>  
>
> real    17m49.351s
>
> user    0m0.392s
>
> sys     1m49.065s
>
>  
>
> The following is dmesg on node1
>
>  
>
> ocfs2_dlm: Nodes in domain ("A2AECED66891407D915CBF282A9E9299"): 0 1 2
>
> o2net: connection to node b30svrxp-ebsdb2.ameripride.com (num 1) at 
> 192.168.3.70:7777 has been idle for 10.0 seconds, shutting it down.
>
> (0,3):o2net_idle_timer:1418 here are some times that might help debug 
> the situation: (tmr 1184814613.883032 now 1184814623.882842 dr 
> 1184814613.883028 adv 1184814613.883033:1184814613.883033 func 
> (2b61f804:504) 1184814613.882900:1184814613.882904)
>
> o2net: no longer connected to node b30svrxp-ebsdb2.ameripride.com (num 
> 1) at 192.168.3.70:7777
>
> (6047,3):dlm_send_proxy_ast_msg:459 ERROR: status = -107
>
> (6047,3):dlm_flush_asts:600 ERROR: status = -107
>
> (20810,0):dlm_do_master_request:1418 ERROR: link to 1 went down!
>
> (20810,0):dlm_get_lock_resource:995 ERROR: status = -107
>
>  
>
> The following is dmesg on node2
>
> (26243,1):dlm_send_remote_convert_request:398 ERROR: status = -107
>
> (26243,1):dlm_wait_for_node_death:365 
> 9EA98E20F6E44FF7B7A89789976C1E32: waiting 5000ms for notification of 
> death of node 0
>
> (7427,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>
> (7427,0):dlm_wait_for_node_death:365 75990178D36942BFA473A2AE4149690C: 
> waiting 5000ms for notification of death of node 0
>
>  
>
> The following is dmesg on node3
>
> mtrr: type mismatch for d8000000,2000000 old: uncachable new: 
> write-combining
>
> adl_trace[9860]: segfault at 000000000000000c rip 0000000040002462 rsp 
> 0000007fbfffe3e0 error 4
>
>  
>
> Any clue? And thanks in advance
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users




More information about the Ocfs2-users mailing list