[Ocfs2-users] Did anything substantial change between 1.2.4 and 1.3.9?

Mon Apr 21 09:57:22 PDT 2008

Do you have the panic output... kernel stack trace. We'll need
that to figure this out. Without that, we can only speculate.

mike wrote:
> On 4/21/08, Tao Ma <tao.ma at oracle.com> wrote:
>   
>> mike wrote:
>>     
>>> I have changed my kernel back to 2.6.22-14-server, and now I don't get
>>> the kernel panics. It seems like an issue with 2.6.24-16 and some i/o
>>> made it crash...
>>>
>>>
>>>       
>> OK, so it seems that it is a bug for ocfs2 kernel, not the ocfs2-tools. :)
>> Then could you please describe it in more detail about how the kernel panic
>> happens?
>>     
>
> Yeah, this specific issue seems like a kernel issue.
>
> I don't know, these are production systems and I am already getting
> angry customers. I can't really test anymore. Both are standard Ubuntu
> kernels.
>
> Okay: 2.6.22-14-server (I think still minor file access issues)
> Breaks under load: 2.6.24-16-server
>
>
>   
>>> However I am still getting file access timeouts once in a while. I am
>>> nervous about putting more load on the setup.
>>>
>>>
>>>       
>> Also please provide more details about it.
>>     
>
> I am using nginx for a frontend load balancer, and nginx for a
> webserver as well. This doesn't seem to be related to the webserver at
> all though, it was happening before this.
>
> lvs01 proxies traffic in to web01, web02, and web03 (currently using
> nginx, before I was using LVS/ipvsadm)
>
> Every so often, one of the webservers sends me back
>
>   
>>> [root at raid01 .batch]# cat /etc/default/o2cb
>>>
>>> # O2CB_ENABLED: 'true' means to load the driver on boot.
>>> O2CB_ENABLED=true
>>>
>>> # O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start.
>>> O2CB_BOOTCLUSTER=mycluster
>>>
>>> # O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead.
>>> O2CB_HEARTBEAT_THRESHOLD=7
>>>
>>>
>>>       
>> This value is a little smaller, so how did you build up your shared
>> disk(iSCSI or ...)? The most common value I heard of is 61. It is about 120
>> secs. I don't know the reason and maybe Sunil can tell you. ;)
>> You can also refer to
>> http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_faq.html#TIMEOUT.
>>
>>     
>>> # O2CB_IDLE_TIMEOUT_MS: Time in ms before a network connection is
>>> considered dead.
>>> O2CB_IDLE_TIMEOUT_MS=10000
>>>
>>> # O2CB_KEEPALIVE_DELAY_MS: Max time in ms before a keepalive packet is
>>>       
>> sent
>>     
>>> O2CB_KEEPALIVE_DELAY_MS=5000
>>>
>>> # O2CB_RECONNECT_DELAY_MS: Min time in ms between connection attempts
>>> O2CB_RECONNECT_DELAY_MS=2000
>>>
>>>
>>> On 4/21/08, Tao Ma <tao.ma at oracle.com> wrote:
>>>
>>>
>>>       
>>>> Hi Mike,
>>>>       Are you sure it is caused by the update of ocfs2-tools?
>>>> AFAIK, the ocfs2-tools only include tools like mkfs, fsck and tunefs
>>>>         
>> etc. So
>>     
>>>> if you don't make any change to the disk(by using this new tools), it
>>>> shouldn't cause the problem of kernel panic since they are all user
>>>>         
>> space
>>     
>>>> tools.
>>>> Then there is only one thing maybe. Have you modify
>>>>         
>> /etc/sysconfig/o2cb(This
>>     
>>>> is the place for RHEL, not sure the place in ubuntu)? I have checked the
>>>>         
>> rpm
>>     
>>>> package for RHEL, it will update /etc/sysconfig/o2cb and this file has
>>>>         
>> some
>>     
>>>> timeouts defined in it.
>>>> So do you have some backups for this file? If yes, please restore it to
>>>>         
>> see
>>     
>>>> whether it helps(I can't say it for sure).
>>>> If not, do you remember the old value of some timeouts you set for
>>>>         
>> ocfs2? If
>>     
>>>> yes, you can use o2cb configure to set them by yourself.
>>>>
>>>>
>>>>         
>>     
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>