[Ocfs2-users] Problems with volumes coming from RHEL5 going to OEL6 (slighly OT)

Srinivas Eeda srinivas.eeda at oracle.com
Wed Jul 10 16:10:28 PDT 2013


fsck.ocfs2 on your image seem to work fine for me

fsck.ocfs2 -f /dev/loop1
fsck.ocfs2 1.8.0
Checking OCFS2 filesystem in /dev/loop1:
   Label:              /export/u04
   UUID:               942CA3E748D249B99D8897E7A151F655
   Number of blocks:   52428800
   Block size:         4096
   Number of clusters: 204800
   Cluster size:       1048576
   Number of slots:    10

/dev/loop1 was run with -f, check forced.
Pass 0a: Checking cluster allocation chains
Pass 0b: Checking inode allocation chains
Pass 0c: Checking extent block allocation chains
Pass 1: Checking inodes and blocks.
Pass 2: Checking directory entries.
Pass 3: Checking directory connectivity.
Pass 4a: checking for orphaned inodes
Pass 4b: Checking inodes link counts.
All passes succeeded.



On 07/10/2013 12:47 PM, Ulf Zimmermann wrote:
>
> I used o2image on OEL 6.3 to create an image of a 200GB file system. 
> Link to the file below.
>
> https://openlane.box.com/s/8sb05jbw2cb9gn2wd65j
>
> Trying to run fsck.ocfs2 on that file also crashes:
>
> [root at co-db03 ocfs2.test]# fsck.ocfs2 dsvp_arc_bk_1_x.o2image.img
>
> fsck.ocfs2 1.8.0
>
> *** glibc detected *** fsck.ocfs2: double free or corruption 
> (fasttop): 0x0000000001c11320 ***
>
> ======= Backtrace: =========
>
> /lib64/libc.so.6[0x3656475366]
>
> fsck.ocfs2[0x434c31]
>
> fsck.ocfs2[0x403bc2]
>
> /lib64/libc.so.6(__libc_start_main+0xfd)[0x365641ecdd]
>
> fsck.ocfs2[0x402879]
>
> ======= Memory map: ========
>
> 00400000-00450000 r-xp 00000000 fc:00 12489 /sbin/fsck.ocfs2
>
> 0064f000-00651000 rw-p 0004f000 fc:00 12489 /sbin/fsck.ocfs2
>
> 00651000-00652000 rw-p 00000000 00:00 0
>
> 00850000-00851000 rw-p 00050000 fc:00 12489 /sbin/fsck.ocfs2
>
> 01c10000-01c31000 rw-p 00000000 00:00 0 [heap]
>
> 3655c00000-3655c20000 r-xp 00000000 fc:00 8797 /lib64/ld-2.12.so
>
> 3655e1f000-3655e20000 r--p 0001f000 fc:00 
> 8797                           /lib64/ld-2.12.so
>
> 3655e20000-3655e21000 rw-p 00020000 fc:00 
> 8797                           /lib64/ld-2.12.so
>
> 3655e21000-3655e22000 rw-p 00000000 00:00 0
>
> 3656400000-3656589000 r-xp 00000000 fc:00 
> 8798                           /lib64/libc-2.12.so
>
> 3656589000-3656788000 ---p 00189000 fc:00 8798 /lib64/libc-2.12.so
>
> 3656788000-365678c000 r--p 00188000 fc:00 8798 /lib64/libc-2.12.so
>
> 365678c000-365678d000 rw-p 0018c000 fc:00 8798 /lib64/libc-2.12.so
>
> 365678d000-3656792000 rw-p 00000000 00:00 0
>
> 3659c00000-3659c16000 r-xp 00000000 fc:00 8802 
> /lib64/libgcc_s-4.4.6-20120305.so.1
>
> 3659c16000-3659e15000 ---p 00016000 fc:00 8802 
> /lib64/libgcc_s-4.4.6-20120305.so.1
>
> 3659e15000-3659e16000 rw-p 00015000 fc:00 8802 
> /lib64/libgcc_s-4.4.6-20120305.so.1
>
> 3d3e800000-3d3e817000 r-xp 00000000 fc:00 12028 /lib64/libpthread-2.12.so
>
> 3d3e817000-3d3ea17000 ---p 00017000 fc:00 12028 /lib64/libpthread-2.12.so
>
> 3d3ea17000-3d3ea18000 r--p 00017000 fc:00 12028 /lib64/libpthread-2.12.so
>
> 3d3ea18000-3d3ea19000 rw-p 00018000 fc:00 12028 /lib64/libpthread-2.12.so
>
> 3d3ea19000-3d3ea1d000 rw-p 00000000 00:00 0
>
> 3e26600000-3e26603000 r-xp 00000000 fc:00 426 /lib64/libcom_err.so.2.1
>
> 3e26603000-3e26802000 ---p 00003000 fc:00 
> 426                            /lib64/libcom_err.so.2.1
>
> 3e26802000-3e26803000 r--p 00002000 fc:00 
> 426                            /lib64/libcom_err.so.2.1
>
> 3e26803000-3e26804000 rw-p 00003000 fc:00 426               
>              /lib64/libcom_err.so.2.1
>
> 7f32ce6bb000-7f32ce6be000 rw-p 00000000 00:00 0
>
> 7f32ce6c7000-7f32ce6ca000 rw-p 00000000 00:00 0
>
> 7fffd3ba1000-7fffd3bc2000 rw-p 00000000 00:00 
> 0                          [stack]
>
> 7fffd3bff000-7fffd3c00000 r-xp 00000000 00:00 
> 0                          [vdso]
>
> ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 
> 0                  [vsyscall]
>
> Abort (core dumped)
>
> *From:*Srinivas Eeda [mailto:srinivas.eeda at oracle.com]
> *Sent:* Wednesday, July 10, 2013 10:56
> *To:* Ulf Zimmermann
> *Cc:* Herbert van den Bergh; Mihail Daskalov; ocfs2-users at oss.oracle.com
> *Subject:* Re: [Ocfs2-users] Problems with volumes coming from RHEL5 
> going to OEL6 (slighly OT)
>
> On 07/10/2013 10:24 AM, Ulf Zimmermann wrote:
>
>     I will see what I can do. How large would a o2image be?
>
> o2image only captures ocfs2 metadata so should be small.
> o2image -r <dev> - | gzip > <dev>.img.gz
>
> To just reiterate, these are not new file systems. They were created 
> with ocfs2-2.6.9-55.ELsmp-1.2.9-1.el4 and ocfs2-tools-1.2.7-1.el4 
> under RHEL 4. The primary user of these volumes is a cluster of 
> 6-nodes running RHEL 5.8 with ocfs2-2.6.18-308.11.1.el5-1.4.10-1 and 
> ocfs2-tools-1.6.3-2.el5. Another machine, which still runs the same 
> EL4 binaries, is mounting these snap cloned volumes daily, doing 
> operations on the DB files and then copying the data off.
>
> *From:*Herbert van den Bergh [mailto:herbert.van.den.bergh at oracle.com]
> *Sent:* Wednesday, July 10, 2013 09:54
> *To:* Mihail Daskalov
> *Cc:* Sunil Mushran; Ulf Zimmermann; ocfs2-users at oss.oracle.com 
> <mailto:ocfs2-users at oss.oracle.com>
> *Subject:* Re: [Ocfs2-users] Problems with volumes coming from RHEL5 
> going to OEL6 (slighly OT)
>
> It's possible that the 1.8.0 tag was never created in the ocfs-tools 
> git repository.  But it's not of any use anyway.  If you check the 
> changelog of the ocfs-tools rpm, you'll see that there were many 
> patches since 1.8.0, so the 1.8.0-10 version that Ulf is using would 
> be very different from a 1.8.0 tag in git.
>
> Ulf, I suggest you create an o2image of the "bad" filesystem, and see 
> if the problem can be reproduced with that image.  If it can, then you 
> may want to make that o2image available to the OCFS2 developers so 
> they can debug ocfs2-tools to see what is causing the malloc/free 
> error.  You may also want to include the exact steps to take to 
> reproduce this, starting from the mkfs up to the failure, indicating 
> exactly what versions of kernel and tools were used along the way.
>
> Thanks,
> Herbert.
>
>
> On 7/10/13 7:55 AM, Mihail Daskalov wrote:
>
>     Hi Sunil,
>
>     Regarding the ocfs tools version 1.8.0 you should know best what
>     it was meant to be (maybe not true for 1.8.0-10 in OEL6U3).
>
>     Is it possible that the tag for 1.8.0 disappeared from the git
>     repository? Or there was never a tag for 1.8.0 ?
>
>     Bellow is the link to commit in 1.8.2 tag, that brings the version
>     to 1.8.0
>
>     https://oss.oracle.com/git/?p=ocfs2-tools.git;a=commitdiff;h=2480a215a600050d2bf923044dffac91439d982a;hp=8b5f4ad727e019cb557c4b516ab401c15c5c317e
>
>     and later on another commit that bring the version to 1.8.2
>
>     https://oss.oracle.com/git/?p=ocfs2-tools.git;a=commitdiff;h=560a1e60936fe868b00cfc9cad5def726e10828e
>
>     I am sorry I am not actually helping to Ulf's problem.
>
>     Ulf, maybe you can really follow the head version and try to see
>     an explanation of the error message.
>
>     Anyway I think it would be best to open a SR with Oracle if you
>     have Linux support contract.
>
>     Does anyone know how to find you the git repository at least for
>     some packages in Oracle Linux. I know the source for each package
>     is available as .src.rpm but how could I see the changes, or the
>     tag from which every version was build?
>
>     I remember Wim talking on something like that a while ago (saying
>      oracle is not like redhat mangling changelogs), but I can't find
>     the article right now.
>
>     If you find out what is behind ocfs2-tools 1.8.0-10 it would be
>     easier to track the problem.
>
>     Regards,
>
>     Mihail Daskalov
>
>     *From:*ocfs2-users-bounces at oss.oracle.com
>     <mailto:ocfs2-users-bounces at oss.oracle.com>
>     [mailto:ocfs2-users-bounces at oss.oracle.com] *On Behalf Of *Sunil
>     Mushran
>     *Sent:* Wednesday, July 10, 2013 2:11 AM
>     *To:* Ulf Zimmermann
>     *Cc:* ocfs2-users at oss.oracle.com <mailto:ocfs2-users at oss.oracle.com>
>     *Subject:* Re: [Ocfs2-users] Problems with volumes coming from
>     RHEL5 going to OEL6
>
>     The error does not make sense. Also I don't know what 1.8.0 tools
>     means. I cannot see that label in the src tree.
>     https://oss.oracle.com/git/?p=ocfs2-tools.git;a=summary
>
>     One option is to build the tools from the head.
>
>     On Tue, Jul 9, 2013 at 2:25 PM, Ulf Zimmermann <ulf at openlane.com
>     <mailto:ulf at openlane.com>> wrote:
>
>     Sunil, any suggestions on this?
>
>     *From:*ocfs2-users-bounces at oss.oracle.com
>     <mailto:ocfs2-users-bounces at oss.oracle.com>
>     [mailto:ocfs2-users-bounces at oss.oracle.com
>     <mailto:ocfs2-users-bounces at oss.oracle.com>] *On Behalf Of *Ulf
>     Zimmermann
>     *Sent:* Saturday, June 22, 2013 15:20
>     *To:* Sunil Mushran
>
>
>     *Cc:* ocfs2-users at oss.oracle.com <mailto:ocfs2-users at oss.oracle.com>
>     *Subject:* Re: [Ocfs2-users] Problems with volumes coming from
>     RHEL5 going to OEL6
>
>     [root at co-db03 ulf]# debugfs.ocfs2 -R "stats"
>     /dev/mapper/aucp_data_bk_2_x
>
>             Revision: 0.90
>
>     Mount Count: 0   Max Mount Count: 20
>
>     State: 0   Errors: 0
>
>     Check Interval: 0   Last Check: Sun Sep 25 05:32:29 2011
>
>     Creator OS: 0
>
>     Feature Compat: 0
>
>             Feature Incompat: 0
>
>             Tunefs Incomplete: 0
>
>             Feature RO compat: 0
>
>             Root Blknum: 513   System Dir Blknum: 514
>
>     First Cluster Group Blknum: 256
>
>     Block Size Bits: 12   Cluster Size Bits: 20
>
>     Max Node Slots: 10
>
>     Extended Attributes Inline Size: 0
>
>     Label: /export/backuprecovery.AUCP
>
>     UUID: 5F9C2727159743529200CE9C5E155562
>
>     Hash: 0 (0x0)
>
>     DX Seeds: 0 0 0 (0x00000000 0x00000000 0x00000000)
>
>     Cluster stack: classic o2cb
>
>       Cluster flags: 0
>
>     Inode: 2   Mode: 00 Generation: 3147295185 <tel:3147295185>
>     (0xbb97e9d1)
>
>             FS Generation: 3147295185 <tel:3147295185> (0xbb97e9d1)
>
>     CRC32: 00000000   ECC: 0000
>
>     Type: Unknown   Attr: 0x0   Flags: Valid System Superblock
>
>             Dynamic Features: (0x0)
>
>             User: 0 (root)   Group: 0 (root)   Size: 0
>
>     Links: 0   Clusters: 1572864
>
>     ctime: 0x4e7f1f5d 0x0 -- Sun Sep 25 05:32:29.0 2011
>
>     atime: 0x0 0x0 -- Wed Dec 31 16:00:00.0 1969
>
>     mtime: 0x4e7f1f5d 0x0 -- Sun Sep 25 05:32:29.0 2011
>
>     dtime: 0x0 -- Wed Dec 31 16:00:00 1969
>
>     Refcount Block: 0
>
>     Last Extblk: 0   Orphan Slot: 0
>
>     Sub Alloc Slot: Global   Sub Alloc Bit: 65535
>
>     *From:*Sunil Mushran [mailto:sunil.mushran at gmail.com
>     <mailto:sunil.mushran at gmail.com>]
>     *Sent:* Friday, June 21, 2013 11:11
>     *To:* Ulf Zimmermann
>     *Cc:* ocfs2-users at oss.oracle.com <mailto:ocfs2-users at oss.oracle.com>
>     *Subject:* Re: [Ocfs2-users] Problems with volumes coming from
>     RHEL5 going to OEL6
>
>     Can you dump the following using the 1.8 binary.
>     debugfs.ocfs2 -R "stats" /dev/mapper/.....
>
>     On Fri, Jun 21, 2013 at 6:17 AM, Ulf Zimmermann <ulf at openlane.com
>     <mailto:ulf at openlane.com>> wrote:
>
>     We have a production cluster of 6 nodes, which are currently
>     running RHEL 5.8 with OCFS2 1.4.10. We snapclone these volumes to
>     multiple destinations, one of them is a RHEL4 machine with OCFS2
>     1.2.9. Because of that the volumes are set so that we can read
>     them there.
>
>     We are now trying to bring up a new server, this one has OEL 6.3
>     on it and it comes with OCFS2 1.8.0 and tools 1.8.0-10. I can use
>     tunefs.ocfs2 --cloned-volume to reset the UUID, but when I try to
>     change the label I get:
>
>     [root at co-db03 ulf]# tunefs.ocfs2 -L /export/backuprecovery.AUCP
>     /dev/mapper/aucp_data_bk_2_x
>
>     tunefs.ocfs2: Invalid name for a cluster while opening device
>     "/dev/mapper/aucp_data_bk_2_x"
>
>     fsck.ocfs2 core dumps with the following, I also filed a bug on
>     Bugzilla for that:
>
>     [root at co-db03 ulf]# fsck.ocfs2 /dev/mapper/aucp_data_bk_2_x
>
>     fsck.ocfs2 1.8.0
>
>     *** glibc detected *** fsck.ocfs2: double free or corruption
>     (fasttop): 0x000000000197f320 ***
>
>     ======= Backtrace: =========
>
>     /lib64/libc.so.6[0x3656475366]
>
>     fsck.ocfs2[0x434c31]
>
>     fsck.ocfs2[0x403bc2]
>
>     /lib64/libc.so.6(__libc_start_main+0xfd)[0x365641ecdd]
>
>     fsck.ocfs2[0x402879]
>
>     ======= Memory map: ========
>
>     00400000-00450000 r-xp 00000000 fc:00 12489 /sbin/fsck.ocfs2
>
>     0064f000-00651000 rw-p 0004f000 fc:00 12489 /sbin/fsck.ocfs2
>
>     00651000-00652000 rw-p 00000000 00:00 0
>
>     00850000-00851000 rw-p 00050000 fc:00 12489 /sbin/fsck.ocfs2
>
>     0197e000-0199f000 rw-p 00000000 00:00 0     [heap]
>
>     3655c00000-3655c20000 r-xp 00000000 fc:00 8797 /lib64/ld-2.12.so
>     <http://ld-2.12.so>
>
>     3655e1f000-3655e20000 r--p 0001f000 fc:00 8797 /lib64/ld-2.12.so
>     <http://ld-2.12.so>
>
>     3655e20000-3655e21000 rw-p 00020000 fc:00 8797
>               /lib64/ld-2.12.so <http://ld-2.12.so>
>
>     3655e21000-3655e22000 rw-p 00000000 00:00 0
>
>     3656400000-3656589000 r-xp 00000000 fc:00 8798 /lib64/libc-2.12.so
>     <http://libc-2.12.so>
>
>     3656589000-3656788000 ---p 00189000 fc:00 8798 /lib64/libc-2.12.so
>     <http://libc-2.12.so>
>
>     3656788000-365678c000 r--p 00188000 fc:00 8798 /lib64/libc-2.12.so
>     <http://libc-2.12.so>
>
>     365678c000-365678d000 rw-p 0018c000 fc:00 8798 /lib64/libc-2.12.so
>     <http://libc-2.12.so>
>
>     365678d000-3656792000 rw-p 00000000 00:00 0
>
>     3659c00000-3659c16000 r-xp 00000000 fc:00 8802
>     /lib64/libgcc_s-4.4.6-20120305.so.1
>
>     3659c16000-3659e15000 ---p 00016000 fc:00 8802
>     /lib64/libgcc_s-4.4.6-20120305.so.1
>
>     3659e15000-3659e16000 rw-p 00015000 fc:00 8802
>     /lib64/libgcc_s-4.4.6-20120305.so.1
>
>     3d3e800000-3d3e817000 r-xp 00000000 fc:00 12028
>     /lib64/libpthread-2.12.so <http://libpthread-2.12.so>
>
>     3d3e817000-3d3ea17000 ---p 00017000 fc:00 12028
>                              /lib64/libpthread-2.12.so
>     <http://libpthread-2.12.so>
>
>     3d3ea17000-3d3ea18000 r--p 00017000 fc:00 12028
>     /lib64/libpthread-2.12.so <http://libpthread-2.12.so>
>
>     3d3ea18000-3d3ea19000 rw-p 00018000 fc:00 12028
>     /lib64/libpthread-2.12.so <http://libpthread-2.12.so>
>
>     3d3ea19000-3d3ea1d000 rw-p 00000000 00:00 0
>
>     3e26600000-3e26603000 r-xp 00000000 fc:00 426 /lib64/libcom_err.so.2.1
>
>     3e26603000-3e26802000 ---p 00003000 fc:00 426 /lib64/libcom_err.so.2.1
>
>     3e26802000-3e26803000 r--p 00002000 fc:00 426 /lib64/libcom_err.so.2.1
>
>     3e26803000-3e26804000 rw-p 00003000 fc:00 426 /lib64/libcom_err.so.2.1
>
>     7fb063711000-7fb063714000 rw-p 00000000 00:00 0
>
>     7fb06371d000-7fb063720000 rw-p 00000000 00:00 0
>
>     7fffd5b95000-7fffd5bb6000 rw-p 00000000 00:00 0 [stack]
>
>     7fffd5bc5000-7fffd5bc6000 r-xp 00000000 00:00 0 [vdso]
>
>     ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00
>     0                  [vsyscall]
>
>     Abort (core dumped)
>
>     I think one of the main question is what is the "Invalid name for
>     a cluster while trying to join the group" or "Invalid name for a
>     cluster while opening device". I am pretty sure that
>     /etc/sysconfig/o2cb and /etc/ocfs2/cluster.conf is correct.
>
>     Ulf.
>
>
>     _______________________________________________
>     Ocfs2-users mailing list
>     Ocfs2-users at oss.oracle.com <mailto:Ocfs2-users at oss.oracle.com>
>     https://oss.oracle.com/mailman/listinfo/ocfs2-users
>
>
>
>
>
>     _______________________________________________
>
>     Ocfs2-users mailing list
>
>     Ocfs2-users at oss.oracle.com  <mailto:Ocfs2-users at oss.oracle.com>
>
>     https://oss.oracle.com/mailman/listinfo/ocfs2-users
>
>
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com  <mailto:Ocfs2-users at oss.oracle.com>
> https://oss.oracle.com/mailman/listinfo/ocfs2-users
>
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20130710/e22c8f87/attachment-0001.html 


More information about the Ocfs2-users mailing list