[Ocfs2-users] remove locks? or copy the whole file?

Aleks Clark aleks.clark at gmail.com
Tue Jul 3 23:46:08 PDT 2012


any ideas how long this is going to take on a 2tb fs with ~400gb used?
going on 10 hours of downtime, and it's been doing Pass 0a for the
past 10 minutes. also, should all the nodes be up (but unmounted) for
this?

On Tue, Jul 3, 2012 at 11:42 PM, Joel Becker <jlbec at evilplan.org> wrote:
> Because it's unsafe to do any I/O at that point.  We'd rather you have
> to reboot than scribble more bad data on your disk!
>
> Joel
>
> On Tue, Jul 03, 2012 at 11:35:32PM -0700, Aleks Clark wrote:
>> it said 'clean' and exited. Working on bringing the cluster down. Is
>> there a reason why, after the kernel panics, ocfs2 makes all i/o
>> block? I can't even unmount the filesystem on any node, I have to
>> actually reboot it.
>>
>> On Tue, Jul 3, 2012 at 11:17 PM, Joel Becker <jlbec at evilplan.org> wrote:
>> > On Tue, Jul 03, 2012 at 06:57:53PM -0700, Aleks Clark wrote:
>> >> well, by 'clean', it said it was clean. the locks persisted though. I
>> >> seriously can't believe there's no way to force lock removal. is it
>> >> just a file somewhere I can delete?
>> >
>> > There's no lock hanging around past a full restart.  This looks like
>> > on-disk corruption.  Did fsck.ocfs2 say that it run multiple passes, or
>> > just say "clean" and exit?  Please try fsck.ocfs2 with the '-f' flag
>> > (obviously with the filesystem not mounted on ANY node).
>> >
>> > Joel
>> >
>> >>
>> >>
>> >> On Tue, Jul 3, 2012 at 6:56 PM, Aleks Clark <aleks.clark at gmail.com> wrote:
>> >> > yep, tried that, returned clean.
>> >> >
>> >> > On Tue, Jul 3, 2012 at 6:25 PM, herbert van.den.bergh
>> >> > <herbert.van.den.bergh at oracle.com> wrote:
>> >> >>
>> >> >> One more thing: did you try running fsck.ocfs2 on it?
>> >> >>
>> >> >> Thanks,
>> >> >> Herbert.
>> >> >>
>> >> >>
>> >> >> On 7/3/2012 6:23 PM, herbert van.den.bergh wrote:
>> >> >>>
>> >> >>> Hmm doesn't mean much to me, but maybe to someone else on the list.  But
>> >> >>> I bet their first suggestion will be to try a recent kernel...
>> >> >>>
>> >> >>> Thanks,
>> >> >>> Herbert.
>> >> >>>
>> >> >>> On 7/3/2012 6:19 PM, Aleks Clark wrote:
>> >> >>>>
>> >> >>>> Nick, I don't think so, it's a 2tb partition with only 300gb used.
>> >> >>>>
>> >> >>>> Herb,
>> >> >>>>
>> >> >>>>
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.578659]
>> >> >>>> (25326,0):ocfs2_rotate_tree_right:2483 ERROR: bug expression:
>> >> >>>> path_leaf_bh(left_path) == path_leaf_bh(right_path)
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.578714]
>> >> >>>> (25326,0):ocfs2_rotate_tree_right:2483 ERROR: Owner 18319883: error
>> >> >>>> during insert of 15761664 (left path cpos 20725762) results in two
>> >> >>>> identical paths ending at 395267
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.578800] ------------[ cut here
>> >> >>>> ]------------
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.578826] kernel BUG at
>> >> >>>>
>> >> >>>> /build/buildd-linux-2.6_2.6.32-38-amd64-bk66e4/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/ocfs2/alloc.c:2483!
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.578881] invalid opcode: 0000 [#1]
>> >> >>>> SMP
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.578909] last sysfs file:
>> >> >>>> /sys/devices/virtual/net/lo/operstate
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.578937] CPU 0
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.578960] Modules linked in:
>> >> >>>> drbd tun ocfs2 jbd2 quota_tree raid0 ip6table_filter ip6_tables
>> >> >>>> iptable_filter ip_tables sha1_generic ebtable_nat ebtables hmac
>> >> >>>> x_tables lru_cache cn kvm_intel kvm ocfs2_dlmfs ocfs2_stack_o2cb
>> >> >>>> ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs bridge stp loop
>> >> >>>> md_mod snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801
>> >> >>>> i2c_core pcspkr processor button psmouse joydev evdev serio_raw usbhid
>> >> >>>> hid ext3 jbd mbcache dm_mod sd_mod crc_t10dif ahci ehci_hcd libata
>> >> >>>> usbcore scsi_mod e1000e nls_base thermal thermal_sys [last unloaded:
>> >> >>>> drbd]
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.579279] Pid: 25326, comm: kvm
>> >> >>>> Not tainted 2.6.32-5-amd64 #1 X9SCL/X9SCM
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.579309] RIP:
>> >> >>>> 0010:[<ffffffffa041177b>]  [<ffffffffa041177b>]
>> >> >>>> ocfs2_do_insert_extent+0x5dc/0x1aaf [ocfs2]
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.579363] RSP:
>> >> >>>> 0018:ffff880014839688  EFLAGS: 00010292
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.579390] RAX: 00000000000000bf
>> >> >>>> RBX: 0000000000060803 RCX: 0000000000001806
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.579435] RDX: 0000000000000000
>> >> >>>> RSI: 0000000000000096 RDI: 0000000000000246
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.579479] RBP: ffff8800148398a8
>> >> >>>> R08: 00000000000209d0 R09: 000000000000000a
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.579524] R10: 0000000000000000
>> >> >>>> R11: 0000000100000000 R12: 00000000013c4002
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.579568] R13: ffff88002a1e4030
>> >> >>>> R14: 0000000000000001 R15: ffff88023c153c60
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.579613] FS:
>> >> >>>> 00007f0cfef83700(0000) GS:ffff880008a00000(0000)
>> >> >>>> knlGS:0000000000000000
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.579659] CS:  0010 DS: 002b ES:
>> >> >>>> 002b CR0: 000000008005003b
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.579687] CR2: 00007f0d25dbf000
>> >> >>>> CR3: 000000023ccb6000 CR4: 00000000000426e0
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.579732] DR0: 0000000000000000
>> >> >>>> DR1: 0000000000000000 DR2: 0000000000000000
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.579776] DR3: 0000000000000000
>> >> >>>> DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.579821] Process kvm (pid:
>> >> >>>> 25326, threadinfo ffff880014838000, task ffff88023b999c40)
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.579867] Stack:
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.579887]  0000000000f08100
>> >> >>>> 00000000013c4002 0000000000060803 ffff880014839718
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.579923]<0>   ffff880232abde80
>> >> >>>> ffff88023b999c40 ffff88023b999c40 ffff8800148397a8
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.579977]<0>   ffff8800148397c8
>> >> >>>> ffff8800148398a8 ffff88023d8027f8 0000000000f08100
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.580047] Call Trace:
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.580074]  [<ffffffffa04186b9>]
>> >> >>>> ? ocfs2_insert_extent+0x5fb/0x6e6 [ocfs2]
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.580108]  [<ffffffffa0442e08>]
>> >> >>>> ? __ocfs2_journal_access+0x261/0x32a [ocfs2]
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.580156]  [<ffffffffa04194da>]
>> >> >>>> ? ocfs2_add_clusters_in_btree+0x35f/0x53c [ocfs2]
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.580205]  [<ffffffffa0436a34>]
>> >> >>>> ? ocfs2_add_inode_data+0x62/0x6e [ocfs2]
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.580239]  [<ffffffffa0442f53>]
>> >> >>>> ? ocfs2_journal_access_di+0x0/0xf [ocfs2]
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.580272]  [<ffffffffa041c1d5>]
>> >> >>>> ? ocfs2_write_begin_nolock+0x1376/0x1de2 [ocfs2]
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.580321]  [<ffffffffa0466e02>]
>> >> >>>> ? ocfs2_set_buffer_uptodate+0x15/0x60e [ocfs2]
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.580370]  [<ffffffffa043a9a5>]
>> >> >>>> ? ocfs2_validate_inode_block+0x0/0x1ab [ocfs2]
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.580418]  [<ffffffffa0442f53>]
>> >> >>>> ? ocfs2_journal_access_di+0x0/0xf [ocfs2]
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.580451]  [<ffffffffa041cd57>]
>> >> >>>> ? ocfs2_write_begin+0x116/0x1d2 [ocfs2]
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.580484]  [<ffffffff810b4fd0>]
>> >> >>>> ? generic_file_buffered_write+0x118/0x278
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.580515]  [<ffffffff810b54e1>]
>> >> >>>> ? __generic_file_aio_write+0x25f/0x293
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.580548]  [<ffffffffa0434fc8>]
>> >> >>>> ? ocfs2_prepare_inode_for_write+0x683/0x69c [ocfs2]
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.580597]  [<ffffffffa042c4e2>]
>> >> >>>> ? ocfs2_rw_lock+0x16d/0x239 [ocfs2]
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.580628]  [<ffffffffa0435b19>]
>> >> >>>> ? ocfs2_file_aio_write+0x45f/0x5da [ocfs2]
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.580674]  [<ffffffff8101654b>]
>> >> >>>> ? sched_clock+0x5/0x8
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.580703]  [<ffffffff8104a4cc>]
>> >> >>>> ? default_wake_function+0x0/0x9
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.580733]  [<ffffffff810eebf2>]
>> >> >>>> ? do_sync_write+0xce/0x113
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.580762]  [<ffffffff81064f92>]
>> >> >>>> ? autoremove_wake_function+0x0/0x2e
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.580792]  [<ffffffff8105cd26>]
>> >> >>>> ? kill_pid_info+0x31/0x3b
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.580819]  [<ffffffff8105cefc>]
>> >> >>>> ? sys_kill+0x72/0x140
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.580847]  [<ffffffff810ef544>]
>> >> >>>> ? vfs_write+0xa9/0x102
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.580875]  [<ffffffff810ef5f4>]
>> >> >>>> ? sys_pwrite64+0x57/0x77
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.580902]  [<ffffffff81010b42>]
>> >> >>>> ? system_call_fastpath+0x16/0x1b
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.580930] Code: 41 b8 b3 09 00
>> >> >>>> 00 48 63 d2 48 c7 c7 6f 48 48 a0 89 0c 24 31 c0 48 c7 c1 c0 df 47 a0
>> >> >>>> 48 89 5c 24 10 44 89 64 24 08 e8 5c 91 ee e0<0f>   0b eb fe 83 7c 24 5c
>> >> >>>> 00 75 1a 49 8b 54 17 08 8b 5c 24 58 0f
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.581120] RIP
>> >> >>>> [<ffffffffa041177b>] ocfs2_do_insert_extent+0x5dc/0x1aaf [ocfs2]
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.581167]  RSP<ffff880014839688>
>> >> >>>> Jul  3 14:47:26 castor kernel: [3488036.581581] ---[ end trace
>> >> >>>> fb597ecc3418e6d6 ]---
>> >> >>>>
>> >> >>>>
>> >> >>>> On Tue, Jul 3, 2012 at 5:39 PM, Herbert van den Bergh
>> >> >>>> <herbert.van.den.bergh at oracle.com>   wrote:
>> >> >>>>>
>> >> >>>>> On 07/03/2012 04:12 PM, Aleks Clark wrote:
>> >> >>>>>>
>> >> >>>>>> Ok, so I've got this ocfs2 cluster that's been running for a long
>> >> >>>>>> while, hosting my VMs. All of the sudden I'm getting kernel panics
>> >> >>>>>> originating from ocfs2 when trying to spin up one particular file.
>> >> >>>>>> I've determined that there are several locks on this file, one of them
>> >> >>>>>> exclusive. I restarted the whole cluster to try to get rid of it, but
>> >> >>>>>> no go. I also tried to copy the file, both on and off of the cluster,
>> >> >>>>>> but only half of it copied. Any way to get around either issue would
>> >> >>>>>> be appreciated.
>> >> >>>>>
>> >> >>>>> The panic stack may be helpful, and any messages that the kernel spit
>> >> >>>>> out
>> >> >>>>> before it.
>> >> >>>>>
>> >> >>>>> Thanks,
>> >> >>>>> Herbert.
>> >> >>>>>
>> >> >>>>>
>> >> >>>>
>> >> >>> _______________________________________________
>> >> >>> Ocfs2-users mailing list
>> >> >>> Ocfs2-users at oss.oracle.com
>> >> >>> https://oss.oracle.com/mailman/listinfo/ocfs2-users
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Aleks Clark
>> >>
>> >>
>> >>
>> >> --
>> >> Aleks Clark
>> >>
>> >> _______________________________________________
>> >> Ocfs2-users mailing list
>> >> Ocfs2-users at oss.oracle.com
>> >> https://oss.oracle.com/mailman/listinfo/ocfs2-users
>> >
>> > --
>> >
>> >  Joel's First Law:
>> >
>> >         Nature abhors a GUI.
>> >
>> >                         http://www.jlbec.org/
>> >                         jlbec at evilplan.org
>>
>>
>>
>> --
>> Aleks Clark
>>
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-users
>
> --
>
> "Heav'n hath no rage like love to hatred turn'd, nor Hell a fury,
>  like a woman scorn'd."
>         - William Congreve
>
>                         http://www.jlbec.org/
>                         jlbec at evilplan.org



-- 
Aleks Clark



More information about the Ocfs2-users mailing list