[Ocfs2-devel]
FW: [Bug 55] New: Ungracefull handling of mounting with a bad
journal file
Lynch, Rusty
rusty.lynch at intel.com
Tue Apr 13 17:48:54 CDT 2004
Started looking deeper into the 2.6 port wackiness around mounting, and
realized that part of the problem relates to both 2.4 and 2.6, so here
is a bug that isolates the general problem of ungracefully failing a
mount when a bad journal file is detected.
-----Original Message-----
From: bugzilla-daemon at oss.oracle.com
[mailto:bugzilla-daemon at oss.oracle.com]
Sent: Tuesday, April 13, 2004 3:57 PM
To: Lynch, Rusty
Subject: [Bug 55] New: Ungracefull handling of mounting with a bad
journal file
http://oss.oracle.com/bugzilla/show_bug.cgi?id=55
Summary: Ungracefull handling of mounting with a bad journal
file
Product: OCFS v2 (WIP)
Version: unspecified
Platform: PC
OS/Version: other
Status: NEW
Severity: normal
Priority: P2
Component: stability
AssignedTo: kurt.hackel at oracle.com
ReportedBy: rusty.lynch at intel.com
The 2.6 kernel already has a bug documenting a crash that occurs
sometimes
when mounting a volume, see bug #54, "[kernel 2.6 porting] system halt
when
mount a ocfs volume...",
which basically comes down to ocfs not generating a proper journal file.
If the scope of bug #54 is limited to "why is the 2.6 port messing up
the
journal file?", then this bug is covering the ungracefull failure while
mounting a journal with a bad journal file.
To reproduce the bug on a 2.4 system ==>
Step 1: Make a fresh ocfs2 volume
Step 2: Mount the volume
Step 3: Unmount the volume
Step 4: Clear the journal magic number (Details below)
First, find the disk offset of the journal file using debugocfs
[root at ocfs1 root]# debugocfs -c 0 /dev/sda1
debugocfs 1.1.1-BETA2 Thu Mar 25 19:20:23 PST 2004 (build
fcb0206676afe0fcac47a99c90de0e7b)
cleanup_log_0:
file_number = 224
disk_offset = 1482752
curr_master = 0
file_lock = OCFS_DLM_ENABLE_CACHE_LOCK
oin_node_map = 10000000000000000000000000000000
seq_num = 0
local_ext = true
granularity = -1
filename = CleanUpLogFile224
filename_len = 0
file_size = 8388608
alloc_size = 8388608
attribs =
prot_bits =
uid = 0
gid = 0
create_time = Wed Dec 31 16:00:00 1969
modify_time = Wed Dec 31 16:00:00 1969
dir_node_ptr = 0
this_sector = 1482752
last_ext_ptr = 0
sync_flags = OCFS_SYNC_FLAG_VALID
link_cnt = 0
next_del = 0
next_free_ext = 1
extent[0].file_off = 0
extent[0].num_bytes = 8388608
extent[0].disk_off = 7790592
extent[1].file_off = 0
extent[1].num_bytes = 0
extent[1].disk_off = 0
extent[2].file_off = 0
extent[2].num_bytes = 0
extent[2].disk_off = 0
file_size = 8388608
alloc_size = 8388608
log_id = 288230376151711744
log_type = unknown log type (0)
>From above, we see that the file data starts at offset 7790592
(extent[0].disk_off), which is 0x76e000 in hexadecimal.
Now use a hex editor to directly edit the partition. I use hexedit by
typing:
hexedit /dev/sda1
...and then press control-g and then type 0x75e000 to have the editor
take me to the correct offset. You should see "c0 3b 39 98" (the
JFS_MAGIC_NUMBER).
Use the hexeditor to zero out the magic number. With hexedit just press
0 eight times and then control-X to exit and save.
Step 5: Attempt to mount the volume again
The mount command will fail with the following in dmesg...
(2795) ENTRY: ocfs_check_volume()
(2795) ENTRY: ocfs_journal_init()
lockres: lockid=1482752.0, this=0, master=0, locktype=8, flags=40004001,
ronode=-1, romap=00000000
new_lock_function: set lockid=1482752.0, locktype=8->2, master=0->0
(2795) TRACE: ocfs_journal_init() fe->file_size = 0.8388608
(2795) TRACE: ocfs_journal_init() fe->alloc_size = 0.8388608
(2795) TRACE: ocfs_journal_init() fe->this_sector = 0.1482752
(2795) TRACE: ocfs_journal_init() inode->i_size = 8388608
(2795) TRACE: ocfs_journal_init() oin->alloc_size = 0.8388608
(2795) TRACE: ocfs_journal_init() Returned from journal_init_inode
(2795) TRACE: ocfs_journal_init() k_journal->j_maxlen = 16384
(2795) EXIT : ocfs_journal_init() = 0
(2795) TRACE: ocfs_check_volume() Node config states a journal version
of 1
(2795) ENTRY: ocfs_journal_wipe()
JBD: no valid journal superblock found
(2795) EXIT : ocfs_journal_wipe() = -22
(2795) ERROR: status = -22, osb.c, 407
(2795) EXIT : ocfs_check_volume() = -22
(2795) ERROR: status = -22, super.c, 1024
(2795) EXIT : ocfs_mount_volume() = -22
(2795) EXIT : __ocfs_read_super() = -22
------------[ cut here ]------------
kernel BUG at ll_rw_blk.c:1027!
invalid operand: 0000
sd_mod ocfs2 parport_pc lp parport autofs nfs lockd sunrpc e100
ipt_REJECT
ipt_state ip_conntrack iptable_filter ip_tables floppy sg sr_mod
microcode sbp2 ide
CPU: 0
EIP: 0060:[<c01bda27>] Not tainted
EFLAGS: 00010246
EIP is at __make_request [kernel] 0xb7 (2.4.22-1.2115.nptl)
eax: 0000000c ebx: 00000000 ecx: 00000000 edx: cc4d9880
esi: 00000001 edi: 000620b9 ebp: cfc16214 esp: cc59bdd8
ds: 0068 es: 0068 ss: 0068
Process ocfs2nm-0 (pid: 2797, stackpage=cc59b000)
Stack: 0000006d cd4ca880 cd4ca880 cd46b6b8 00000000 cd46b600 d09e2460
ccb79e80
cf6f0980 00000080 00000000 00000000 cfc16244 000007f9 00000000
00000001
0000002a cc59be54 cc4d9880 00000001 000620b9 0000002a c01be1ea
cfc16214
Call Trace: [<d09e2460>] ocfs_bh_sem_lookup [ocfs2] 0x134 (0xcc59bdf0)
[<c01be1ea>] generic_make_request [kernel] 0xda (0xcc59be30)
[<d09e27f3>] ocfs_bh_sem_unlock [ocfs2] 0x2f (0xcc59be40)
[<c01be297>] submit_bh [kernel] 0x57 (0xcc59be58)
[<c011b389>] context_switch [kernel] 0x79 (0xcc59be70)
[<d09ea9ec>] ocfs_read_bhs [ocfs2] 0x5b0 (0xcc59be80)
[<d09e28e4>] ocfs_bh_sem_hash_prune [ocfs2] 0xdc (0xcc59bec0)
[<d09f8a5c>] ocfs_volume_thread [ocfs2] 0x294 (0xcc59bee0)
[<c0119c51>] schedule [kernel] 0x31 (0xcc59bf88)
[<c0109c4d>] reschedule [kernel] 0x5 (0xcc59bfc0)
[<d09f87c8>] ocfs_volume_thread [ocfs2] 0x0 (0xcc59bfe0)
[<c010741d>] kernel_thread_helper [kernel] 0x5 (0xcc59bff0)
Code: 0f 0b 03 04 44 4f 29 c0 8b 4c 24 64 8b 15 f0 f8 3d c0 8b 41
At this point (unlike a debug build of the 2.6 kernel) the system is
still
running until you try to either:
* mount the volume again
* unload the ocfs2 module
* shutdown
... then the system hangs.
------- You are receiving this mail because: -------
You reported the bug, or are watching the reporter.
More information about the Ocfs2-devel
mailing list