[Ocfs2-users] OCFS2 - Bad magic number

Eric Ren zren at suse.com
Tue May 24 19:21:48 PDT 2016


Hi,

On 05/24/2016 02:50 PM, Mailer Regs wrote:
> Hi Eric,
> Thanks for your reply.
> Actually my boss requested me to bring back the file system yesterday, at the cost of our media data (and half my monthly payment) to continue our service.
> Can you show me how to get debug information for this case (or documentation about it) ? Actually I'm not very familiar with GDB or debugging techniques, but I will try to reproduce the situation and solve it to prevent future problems like this to happen.
>

You're welcome.
1. what's your linux distribution?
2. what DLM stack do you use? o2cb or pacemaker?

Take opensuse Leap42.1 for example:
1. ensure you have relative repos (for software/debuginfo/source packages):
$ zypper lr -u
# | Alias                           | Name                        | 
Enabled | GPG Check | Refresh | Type   | URI
3 | download.opensuse.org-oss       | Main Repository (DEBUG)     | Yes 
     | (r ) Yes  | Yes     | yast2  | 
http://download.opensuse.org/debug/distribution/leap/42.1/repo/oss/

4 | download.opensuse.org-oss_1     | Main Repository (OSS)       | Yes 
     | (r ) Yes  | Yes     | yast2  | 
http://download.opensuse.org/distribution/leap/42.1/repo/oss/

5 | download.opensuse.org-oss_2     | Main Repository (Sources)   | Yes 
     | (r ) Yes  | Yes     | yast2  | 
http://download.opensuse.org/source/distribution/leap/42.1/repo/oss/

2. $zypper search ocfs2-tools
S | Name                       | Summary 
                       | Type
--+----------------------------+--------------------------------------------------------------+-----------
   | ocfs2-tools                | Oracle Cluster File System 2 Core 
Tools                      | package
   | ocfs2-tools                | Oracle Cluster File System 2 Core 
Tools                      | srcpackage
   | ocfs2-tools-debuginfo      | Debug information for package 
ocfs2-tools                    | package
   | ocfs2-tools-debugsource    | Debug sources for package ocfs2-tools 
                        | package
   | ocfs2-tools-devel          | Oracle Cluster File System 2 
Development files               | package
   | ocfs2-tools-devel-static   | Oracle Cluster File System 2 static 
libraries                | package
   | ocfs2-tools-o2cb           | Oracle Cluster File System 2 tools for 
the native o2cb stack | package
   | ocfs2-tools-o2cb-debuginfo | Debug information for package 
ocfs2-tools-o2cb               | package

3. $sudo zypper install ocfs2-tools ocfs2-tools-debuginfo
$sudo zypper source-install ocfs2-tools

4. gdb --args mount -t ocfs2 /dev/mapper/mpath3p1 /test
now you're in gdb...
learn about this cmds: start, breakpoint, run, continue, next, step, 
list, would be enough for you. For you refer:
https://sourceware.org/gdb/current/onlinedocs/gdb/


5. grep in ocfs2-tools source:
$:~/ocfs2-tools> grep -rn "while trying to determine heartbeat information"
mount.ocfs2/mount.ocfs2.c:385:				"while trying to determine heartbeat 
information");

$vim mount.ocfs2/mount.ocfs2.c +385
we can see something bad happened in ocfs2_fill_heartbeat_desc(). so 
make a breakpoint and `nexti` into it.

6. do the similar step for fsck.ocfs2.

BTW, there's an ocfs2 IRC channel you can find here:
https://oss.oracle.com/pipermail/ocfs2-devel/2016-April/011934.html

Eric


> Sent from my BlackBerry 10 smartphone.
>    Original Message
> From: Eric Ren
> Sent: 13:38 Thứ ba, ngày 24 tháng năm năm 2016
> To: Mailer Regs; ocfs2-users at oss.oracle.com
> Subject: Re: [Ocfs2-users] OCFS2 - Bad magic number
>
> Hello,
>
> I don't encounter this so far. You can install relative ocfs2-tools
> debug packages and gdb to find out what's happening. And get your
> findings back to us;-)
>
> To me, it look like a DLM issue, not super block.
>
> Eric
>
> On 05/22/2016 05:44 AM, Mailer Regs wrote:
>> Hi LQ friends,
>>
>> I have a problem with our OCFS2 cluster, which I couldn't solve by myself.
>> In short, I have a OCFS2 cluster with 3 nodes and a shared storage LUN. I
>> have mapped the LUN to all 3 of the nodes, and split the LUN into 2
>> partitions, formatted them as OCFS2 filesystems and mounted them
>> successfully. The system has been running OK for nearly 2 years, but today
>> the partition 1 suddenly is not accessible. I have to reboot 1 node. After
>> rebooting, the partition 2 is mounted OK, but the partition 1 cannot be
>> mounted.
>> The error is below:
>>
>> # mount -t ocfs2 /dev/mapper/mpath3p1 /test
>> mount.ocfs2: Bad magic number in inode while trying to determine
>> heartbeat information
>>
>> # fsck.ocfs2 /dev/mapper/mpath3p1
>> fsck.ocfs2 1.6.3
>> fsck.ocfs2: Bad magic number in inode while initializing the DLM
>>
>> # fsck.ocfs2 -r 2 /dev/mapper/mpath3p1
>> fsck.ocfs2 1.6.3
>> [RECOVER_BACKUP_SUPERBLOCK] Recover superblock information from backup
>> block#1048576? <n> y
>> fsck.ocfs2: Bad magic number in inode while initializing the DLM
>>
>> # parted /dev/mapper/mpath3
>> GNU Parted 1.8.1
>> Using /dev/mapper/mpath3
>> Welcome to GNU Parted! Type 'help' to view a list of commands.
>> (parted) print
>>
>> Model: Linux device-mapper (dm)
>> Disk /dev/mapper/mpath3: 20.0TB
>> Sector size (logical/physical): 512B/512B
>> Partition Table: gpt
>>
>> Number Start End Size File system Name Flags
>> 1 17.4kB 10.2TB 10.2TB primary
>> 2 10.2TB 20.0TB 9749GB primary
>>
>>
>>
>> Usually, the bad magic number happens when the super block is corrupted,
>> and I have experienced several similar cases before, which can be solved
>> quickly by using backup super blocks. But this case is different, I cannot
>> fix the problem by simply replacing the super block, thus I'm out of ideas.
>>
>> Please take a look and suggest me how to solve this problem, as I need to
>> recover the data, it's the most important goal now.
>>
>> Thanks in advance.
>>
>>
>>
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-users
>>
>
>




More information about the Ocfs2-users mailing list