[Ocfs2-devel] FW: [Bug 55] New: Ungracefull handling of mounting with a bad journal file

Tue Apr 13 17:48:54 CDT 2004

Started looking deeper into the 2.6 port wackiness around mounting, and
realized that part of the problem relates to both 2.4 and 2.6, so here
is a bug that isolates the general problem of ungracefully failing a
mount when a bad journal file is detected.

-----Original Message-----
From: bugzilla-daemon at oss.oracle.com
[mailto:bugzilla-daemon at oss.oracle.com] 
Sent: Tuesday, April 13, 2004 3:57 PM
To: Lynch, Rusty
Subject: [Bug 55] New: Ungracefull handling of mounting with a bad
journal file

http://oss.oracle.com/bugzilla/show_bug.cgi?id=55

           Summary: Ungracefull handling of mounting with a bad journal
file
           Product: OCFS v2 (WIP)
           Version: unspecified
          Platform: PC
        OS/Version: other
            Status: NEW
          Severity: normal
          Priority: P2
         Component: stability
        AssignedTo: kurt.hackel at oracle.com
        ReportedBy: rusty.lynch at intel.com

The 2.6 kernel already has a bug documenting a crash that occurs
sometimes
when mounting a volume, see bug #54, "[kernel 2.6 porting] system halt
when
mount a ocfs volume...",
which basically comes down to ocfs not generating a proper journal file.

If the scope of bug #54 is limited to "why is the 2.6 port messing up
the
journal file?", then this bug is covering the ungracefull failure while
mounting a journal with a bad journal file.

To reproduce the bug on a 2.4 system ==>

Step 1: Make a fresh ocfs2 volume
Step 2: Mount the volume
Step 3: Unmount the volume
Step 4: Clear the journal magic number (Details below)

First, find the disk offset of the journal file using debugocfs

[root at ocfs1 root]# debugocfs -c 0 /dev/sda1
debugocfs 1.1.1-BETA2 Thu Mar 25 19:20:23 PST 2004 (build
fcb0206676afe0fcac47a99c90de0e7b)
cleanup_log_0:
        file_number = 224
        disk_offset = 1482752
        curr_master = 0
        file_lock = OCFS_DLM_ENABLE_CACHE_LOCK
        oin_node_map = 10000000000000000000000000000000
        seq_num = 0
        local_ext = true
        granularity = -1
        filename = CleanUpLogFile224
        filename_len = 0
        file_size = 8388608
        alloc_size = 8388608
        attribs =
        prot_bits =
        uid = 0
        gid = 0
        create_time = Wed Dec 31 16:00:00 1969
        modify_time = Wed Dec 31 16:00:00 1969
        dir_node_ptr = 0
        this_sector = 1482752
        last_ext_ptr = 0
        sync_flags = OCFS_SYNC_FLAG_VALID
        link_cnt = 0
        next_del = 0
        next_free_ext = 1
        extent[0].file_off = 0
        extent[0].num_bytes = 8388608
        extent[0].disk_off = 7790592
        extent[1].file_off = 0
        extent[1].num_bytes = 0
        extent[1].disk_off = 0
        extent[2].file_off = 0
        extent[2].num_bytes = 0
        extent[2].disk_off = 0

        file_size = 8388608
        alloc_size = 8388608
        log_id = 288230376151711744
        log_type = unknown log type (0)

>From above, we see that the file data starts at offset 7790592 
(extent[0].disk_off), which is 0x76e000 in hexadecimal.

Now use a hex editor to directly edit the partition.  I use hexedit by
typing:
hexedit /dev/sda1

...and then press control-g and then type 0x75e000 to have the editor
take me to the correct offset.  You should see "c0 3b 39 98" (the
JFS_MAGIC_NUMBER). 

Use the hexeditor to zero out the magic number.  With hexedit just press
0 eight times and then control-X to exit and save.

Step 5: Attempt to mount the volume again

The mount command will fail with the following in dmesg...

(2795) ENTRY: ocfs_check_volume() 
(2795) ENTRY: ocfs_journal_init() 
lockres: lockid=1482752.0, this=0, master=0, locktype=8, flags=40004001,
ronode=-1, romap=00000000
new_lock_function: set lockid=1482752.0, locktype=8->2, master=0->0
(2795) TRACE: ocfs_journal_init() fe->file_size = 0.8388608
(2795) TRACE: ocfs_journal_init() fe->alloc_size = 0.8388608
(2795) TRACE: ocfs_journal_init() fe->this_sector = 0.1482752
(2795) TRACE: ocfs_journal_init() inode->i_size = 8388608
(2795) TRACE: ocfs_journal_init() oin->alloc_size = 0.8388608
(2795) TRACE: ocfs_journal_init() Returned from journal_init_inode
(2795) TRACE: ocfs_journal_init() k_journal->j_maxlen = 16384
(2795) EXIT : ocfs_journal_init() = 0 
(2795) TRACE: ocfs_check_volume() Node config states a journal version
of 1
(2795) ENTRY: ocfs_journal_wipe() 
JBD: no valid journal superblock found
(2795) EXIT : ocfs_journal_wipe() = -22 
(2795) ERROR: status = -22, osb.c, 407
(2795) EXIT : ocfs_check_volume() = -22 
(2795) ERROR: status = -22, super.c, 1024
(2795) EXIT : ocfs_mount_volume() = -22 
(2795) EXIT : __ocfs_read_super() = -22 
------------[ cut here ]------------
kernel BUG at ll_rw_blk.c:1027!
invalid operand: 0000
sd_mod ocfs2 parport_pc lp parport autofs nfs lockd sunrpc e100
ipt_REJECT
ipt_state ip_conntrack iptable_filter ip_tables floppy sg sr_mod
microcode sbp2 ide
CPU:    0
EIP:    0060:[<c01bda27>]    Not tainted
EFLAGS: 00010246

EIP is at __make_request [kernel] 0xb7 (2.4.22-1.2115.nptl)
eax: 0000000c   ebx: 00000000   ecx: 00000000   edx: cc4d9880
esi: 00000001   edi: 000620b9   ebp: cfc16214   esp: cc59bdd8
ds: 0068   es: 0068   ss: 0068
Process ocfs2nm-0 (pid: 2797, stackpage=cc59b000)
Stack: 0000006d cd4ca880 cd4ca880 cd46b6b8 00000000 cd46b600 d09e2460
ccb79e80 
       cf6f0980 00000080 00000000 00000000 cfc16244 000007f9 00000000
00000001 
       0000002a cc59be54 cc4d9880 00000001 000620b9 0000002a c01be1ea
cfc16214 
Call Trace:   [<d09e2460>] ocfs_bh_sem_lookup [ocfs2] 0x134 (0xcc59bdf0)
[<c01be1ea>] generic_make_request [kernel] 0xda (0xcc59be30)
[<d09e27f3>] ocfs_bh_sem_unlock [ocfs2] 0x2f (0xcc59be40)
[<c01be297>] submit_bh [kernel] 0x57 (0xcc59be58)
[<c011b389>] context_switch [kernel] 0x79 (0xcc59be70)
[<d09ea9ec>] ocfs_read_bhs [ocfs2] 0x5b0 (0xcc59be80)
[<d09e28e4>] ocfs_bh_sem_hash_prune [ocfs2] 0xdc (0xcc59bec0)
[<d09f8a5c>] ocfs_volume_thread [ocfs2] 0x294 (0xcc59bee0)
[<c0119c51>] schedule [kernel] 0x31 (0xcc59bf88)
[<c0109c4d>] reschedule [kernel] 0x5 (0xcc59bfc0)
[<d09f87c8>] ocfs_volume_thread [ocfs2] 0x0 (0xcc59bfe0)
[<c010741d>] kernel_thread_helper [kernel] 0x5 (0xcc59bff0)

Code: 0f 0b 03 04 44 4f 29 c0 8b 4c 24 64 8b 15 f0 f8 3d c0 8b 41 

At this point (unlike a debug build of the 2.6 kernel) the system is
still 
running until you try to either:
* mount the volume again
* unload the ocfs2 module
* shutdown

... then the system hangs.

------- You are receiving this mail because: -------
You reported the bug, or are watching the reporter.