[Ocfs2-users] 答复: remove locks? or copy the whole file?

Aleks Clark aleks.clark at gmail.com
Wed Jul 4 09:33:57 PDT 2012


ok I added some debug statements, and the depth counter is just
increasing infinitely when I hit this chain. ideas where to fixy?
under what conditions should the depth counter increase?

On Wed, Jul 4, 2012 at 9:01 AM, Aleks Clark <aleks.clark at gmail.com> wrote:
> this is using latest source tarball from oss.oracle.com
>
>
> On Wed, Jul 4, 2012 at 9:00 AM, Aleks Clark <aleks.clark at gmail.com> wrote:
>> I found the infinite loop. chain gets down to 69 (lol) and does this forever:
>>
>> repair_group_desc:363 | checking desc at 2225664; blkno 2225664 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 10063872; blkno 10063872 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 88445952; blkno 88445952 size
>> 4032 bits 32256 free_bits 394 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 80607744; blkno 80607744 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 72769536; blkno 72769536 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 64931328; blkno 64931328 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 57093120; blkno 57093120 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 49254912; blkno 49254912 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 41416704; blkno 41416704 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 33578496; blkno 33578496 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 25740288; blkno 25740288 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 17902080; blkno 17902080 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 2225664; blkno 2225664 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 10063872; blkno 10063872 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 88445952; blkno 88445952 size
>> 4032 bits 32256 free_bits 394 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 80607744; blkno 80607744 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 72769536; blkno 72769536 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 64931328; blkno 64931328 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 57093120; blkno 57093120 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 49254912; blkno 49254912 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 41416704; blkno 41416704 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 33578496; blkno 33578496 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 25740288; blkno 25740288 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 17902080; blkno 17902080 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 2225664; blkno 2225664 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 10063872; blkno 10063872 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 88445952; blkno 88445952 size
>> 4032 bits 32256 free_bits 394 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 80607744; blkno 80607744 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 72769536; blkno 72769536 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 64931328; blkno 64931328 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 57093120; blkno 57093120 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 49254912; blkno 49254912 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 41416704; blkno 41416704 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 33578496; blkno 33578496 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 25740288; blkno 25740288 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 17902080; blkno 17902080 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>> repair_group_desc:363 | checking desc at 2225664; blkno 2225664 size
>> 4032 bits 32256 free_bits 1535 chain 69 generation 1910514588
>>
>>
>> On Wed, Jul 4, 2012 at 4:49 AM, Aleks Clark <aleks.clark at gmail.com> wrote:
>>> looks like I got hit by this:
>>>
>>> https://oss.oracle.com/pipermail/ocfs2-users/2011-April/005106.html
>>>
>>> guess I'll cancel that fsck and upgrade after all :P
>>>
>>> On Wed, Jul 4, 2012 at 4:38 AM, Aleks Clark <aleks.clark at gmail.com> wrote:
>>>> I'll try that kernel upgrade while I've got the cluster down. Has
>>>> anyone given any thought to multi-threading fsck.ocfs2? From my top
>>>> stats, it's clearly CPU-bound (also going on 5 hours, still haven't
>>>> seen the end of the first pass).
>>>>
>>>> On Tue, Jul 3, 2012 at 11:50 PM, Guozhonghua <guozhonghua at h3c.com> wrote:
>>>>>   Hi,
>>>>>
>>>>>   I had used the ocfs2 with Linux kernel 2.6.39, there are some problems may be same with you.
>>>>>
>>>>>   I download the Linux kernel 3.2.X, and compare the source code with 2.6.39, and find so many codes changed.
>>>>>   So as to update the kernel and the problems disappeared.
>>>>>
>>>>>   I recommend you update the kernel to recent, may be very stable.
>>>>>   I used the recent kernel and the ocfs2 module is very stable and it had run for several weeks without reboot, panic.
>>>>>
>>>>>   Another note, you will set the I/O schedule method with deadline, and it will be fitful for ocfs2.
>>>>>
>>>>>   elevator=deadline
>>>>>
>>>>>   Please prefer the ocfs2_faq.txt for details:
>>>>>
>>>>>   Q07   I encounter "Kernel panic - not syncing: ocfs2 is very sorry to be fencing this system by panicing" whenever I run a heavy io
>>>>>         load? A07       We have encountered a bug with the default "cfq" io scheduler which causes a process doing heavy io to temporarily starve out
>>>>>         other processes. While this is not fatal for most environments,
>>>>>         it is for OCFS2 as we expect the hb thread to be r/w to the hb
>>>>>         area atleast once every 12 secs (default).
>>>>>         Bug with the fix has been filed with Red Hat and Novell. For
>>>>>         more, refer to the tracker bug filed on bugzilla:
>>>>>         http://oss.oracle.com/bugzilla/show_bug.cgi?id=671
>>>>>         Till this issue is resolved, one is advised to use the
>>>>>         "deadline" io scheduler. To use deadline, add "elevator=deadline"
>>>>>         to the kernel command line as follows:
>>>>>         1. For SLES9, edit the command line in /boot/grub/menu.lst.
>>>>>         title Linux 2.6.5-7.244-bigsmp  elevator=deadline kernel (hd0,4)/boot/vmlinuz-2.6.5-7.244-bigsmp root=/dev/sda5 vga=0x314 selinux=0 splash=silent resume=/dev/sda3
>>>>>                         elevator=deadline showopts console=tty0
>>>>>                         console=ttyS0,115200 noexec=off initrd (hd0,4)/boot/initrd-2.6.5-7.244-bigsmp
>>>>>         2. For RHEL4, edit the command line in /boot/grub/grub.conf:
>>>>>         title Red Hat Enterprise Linux AS (2.6.9-22.EL) root (hd0,0)
>>>>>                 kernel /vmlinuz-2.6.9-22.EL ro root=LABEL=/ console=ttyS0,115200 console=tty0 elevator=deadline noexec=off initrd /initrd-2.6.9-22.EL.img
>>>>>         To see the current kernel command line, do:
>>>>>         # cat /proc/cmdline ==============================================================================
>>>>> -------------------------------------------------------------------------------------------------------------------------------------
>>>>> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
>>>>> 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
>>>>> 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
>>>>> 邮件!
>>>>> This e-mail and its attachments contain confidential information from H3C, which is
>>>>> intended only for the person or entity whose address is listed above. Any use of the
>>>>> information contained herein in any way (including, but not limited to, total or partial
>>>>> disclosure, reproduction, or dissemination) by persons other than the intended
>>>>> recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
>>>>> by phone or email immediately and delete it!
>>>>
>>>>
>>>>
>>>> --
>>>> Aleks Clark
>>>
>>>
>>>
>>> --
>>> Aleks Clark
>>
>>
>>
>> --
>> Aleks Clark
>
>
>
> --
> Aleks Clark



-- 
Aleks Clark



More information about the Ocfs2-users mailing list