[Ocfs2-users] Strange problems (deadlock) in ocfs2 (rpm 1.2.4-2 and svn 2982) - dlm related?

Sunil Mushran Sunil.Mushran at oracle.com
Mon Mar 5 11:41:38 PST 2007


# tcpdump -i <eth1> -C 10 -W 15 -s 10000 -Sw /tmp/`hostname 
-s`_tcpdump.log -ttt 'port 7777' &

Initiate tcpdumps on the other 3 nodes. Start the dd's on one node.
Kill that node. Let it boot back up.

When you see the problem, do:

# ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN


Stop the tcpdumps and make them available to me via some ftp site or 
whatever.

Also, file a bugzilla for tracking purposes.

Marcus Alves Grando wrote:
> Sunil Mushran wrote:
>> How many nodes in the cluster?
>
> Four.
>
>>
>> Marcus Alves Grando wrote:
>>> Hi list,
>>>
>>> I have some problems testing ocfs2. My test consist in:
>>>
>>> #server1: dd if=/dev/random of=/ocfs2_1/test &
>>> #server1: dd if=/dev/random of=/ocfs2_2/test &
>>> #server1: dd if=/dev/random of=/ocfs2_3/test &
>>> ...
>>> #server1: dd if=/dev/random of=/ocfs2_12/test &
>>> #server1:<Ctrl><Alt><SysRQ>B
>
> Correct is: <Alt>+<SysRQ>+b
>
> Regards
>
>>>
>>> After that, another node begin recovery. After some time (+- 3min), 
>>> recovery is done. When server1 boot and try mounting all ocfs2 
>>> filesystems, some problem occurs. Most filesystems mount, but one 
>>> doesn't. In another node i try to access this filesystem (like ls or 
>>> cd), and freeze sheel. With ps i can see status of that process: 
>>> "D+" (Uninterruptible sleep).
>>>
>>> Today i'm use svn version 2982 (ocfs2-1.2 branch), and doesn't help. 
>>> ocfs2-tool are 1.2.3. And i test ocfs2-1.2.3 and ocfs2-1.2.4 redhat 
>>> AS4 rpms too without success. Servers are RedHat AS4.4, with all 
>>> updated applied.
>>>
>>> The only way to back this filesystem online are rebooting all nodes. :(
>>>
>>> Someone know about this problem or have fix for that? Maybe dlm 
>>> ralated issue? I see many commits dlm related in git...
>>>
>>> Regards
>>>
>



More information about the Ocfs2-users mailing list