[Ocfs2-users] Periodic hangs

Sunil Mushran sunil.mushran at oracle.com
Fri Oct 15 13:05:47 PDT 2010


  I am not asking you to cause a hang. Just that you take a stack trace
when you encounter one.

If you don't see /proc/.../stack, then CONFIG_STACKTRACE has not been
enabled in your kernel. You'll have to use the old fashioned method of
setting up a netconsole server and then issuing "echo t >/proc/sysrq-trigger"

On 10/15/2010 12:20 PM, Emil Noether wrote:
> Hi,
>
> thank you for the reply. Are you sure with this command?, because when I run
>
> find /proc -name stack
>
> I get no output. But I'm running this command when the server is OK. I can't cause the hang right now, because it is 21:00 here so it is a "prime time" of my web and my customers are already quite upset. But I can try it tommorow morning.
>
> Regards,
> Emil Noether
>
> On 10/15/2010 07:22 PM, Sunil Mushran wrote:
>> Take a stack trace of the hang. If you are on 2.6.32, you could do:
>>
>> # find /proc -name stack | while read A ; do D=$(dirname $A); echo $A; cat $D/cmdline; echo ; cat $A; echo ; done;
>>
>> Attach the output to a bugzilla on oss.oracle.com.
>>
>> On 10/15/2010 08:16 AM, Emil Noether wrote:
>>> Hi,
>>>
>>> I have a SATABoy2 Nexan storage with 8 disks (SATA Hitachi HUA721075KLA330) connected to raid 6.  Two image servers and two webservers. Image servers are connected to storage via iSCSI (1GBit) and webservers are connected via fibre (QLogic ISP2432-based 4Gb). There is ocfs2 filesystem on the storage disk. When I disconnect webserver1 (identical with webserver2) everything is ok. But when I do "/etc/init.d/o2cb start", even without mounting the storage disk (so webserver is actually doing nothing) my project is down every aprox 30 minutes for aprox 2 minutes.
>>>
>>> To describe what is down: There is no problem on image servers, but there is a problem on webserver2. Mounted ocfs2 disk is not responding (I can't run even "df" command), so load goes to aprox 400 and number of running apaches reaches it's maximum and so on. The web page is not responding.
>>>
>>> I store all of my logs on local disks so not on ocfs2 disk.
>>>
>>> I use 2.6.32 kernel on servers, but I have already tried change it to some another, but with no result.
>>>
>>> I use ocfs2-tools in version 1.4.1-1.
>>>
>>> My distro is Debian Lenny (5.0.6) x64.
>>>
>>> My /etc/default/o2cb:
>>> O2CB_ENABLED=true
>>> O2CB_BOOTCLUSTER=ocfs2
>>> O2CB_HEARTBEAT_THRESHOLD=14
>>> O2CB_IDLE_TIMEOUT_MS=10000
>>> O2CB_KEEPALIVE_DELAY_MS=5000
>>> O2CB_RECONNECT_DELAY_MS=2000
>>>
>>> My /etc/ocfs2/cluster.conf:
>>> node:
>>>   ip_port = 7777
>>>   ip_address = 10.0.0.111
>>>   number = 0
>>>   name = www1
>>>   cluster = ocfs2
>>>
>>> node:
>>>   ip_port = 7777
>>>   ip_address = 10.0.0.112
>>>   number = 1
>>>   name = ww2
>>>   cluster = ocfs2
>>>
>>> node:
>>>   ip_port = 7777
>>>   ip_address = 10.0.0.121
>>>   number = 2
>>>   name = img1
>>>   cluster = ocfs2
>>>
>>> node:
>>>   ip_port = 7777
>>>   ip_address = 10.0.0.122
>>>   number = 3
>>>   name = img2
>>>   cluster = ocfs2
>>>
>>> cluster:
>>>   node_count = 4
>>>   name = ocfs2
>>>
>>>
>>> Any help is very appreciated,
>>> Best Regards,
>>>
>>> Emil Noether
>>>
>>>
>>> _______________________________________________
>>> Ocfs2-users mailing list
>>> Ocfs2-users at oss.oracle.com
>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20101015/0e16a8d7/attachment.html 


More information about the Ocfs2-users mailing list