[Ocfs2-users] ocfs2 crash on intensive disk write

Matthew Chan talcite at gmail.com
Sat Aug 21 23:20:22 PDT 2010


  Hi,

I'm getting system (and eventually cluster) crashes on intensive disk 
writes in ubuntu server 10.04 with my OCFS2 file system.

I have an iSER (infiniband) backed shared disk array with OCFS2 on it. 
There are 6 nodes in the cluster, and the heartbeat interface is over a 
regular 1GigE connection. Originally, the problem presented itself while 
I was doing performance testing and it's been reproducible ever since.

Running something like

'dd if=/dev/zero of=/<ocfs2 array>/zeroes bs=64k count=100000'

kills the node almost immediately, and then subsequently hangs the rest 
of the cluster when other nodes try to unmount the array (for a restart 
or whatever other reason). This happens regardless how many nodes are 
running on the server. I've tried with a single node and it still happens.

I was lucky enough to capture some messages from stderr that weren't 
being caught by syslog. I've attached it here as a screenshot, as my 
management interface doesn't allow directly copying or pasting text. 
Please take a look: http://img163.imageshack.us/img163/4771/screenshots.png

Take note that there are no other nodes started up, and I have no idea 
how there could be another node "heartbeating" in the same slot.

I should also note that I originally had the heartbeat configured on the 
same infiniband interface, so I thought the iSER traffic was blocking 
out the heartbeat. However, configuring the heartbeat to use another 
interface didn't help solve the problem. I'm also fairly certain it is 
not the iSER interface causing problems because I have formatted the 
array as ext4 and successfully run read/write tests (from one node at a 
time of course).

Thanks in advance for any replies,

Matt






More information about the Ocfs2-users mailing list