Thanks Sunil. <br><br>I treat the ocfs2/LVM volumes as static partitions so that shouldn't cause problems unless I'm attempting to resize or something like that right?<br><br>---recovery---<br>In the past I tried to recover using "debugfs.ocfs rdump" but it would always fail with an error message of: <br>
<br>debugfs.ocfs2: extend_file.c:211: ocfs2_new_path: Assertion `root_el->l_tree_depth < 5' failed.<br><br>After your suggestion I'm trying it again and breaking the rdump operation into parts so that it never needs to traverse more than 5 subdirectories deep and it seems to be working very nicely. I've recovered about 60% so far and it looks like smooth sailing from here.<br>
<br>I was exporting this volume over iSCSI and I suspect the filesystem problem came about as a result of fencing resets?<br><br>90% of my way through the recovery and it turns out that each of the more volume-wide rdump attemps were chocking at the same specific point: a symlink pointing to a directory on a different filesystem that then descended back into the ocfs volume. <br>
<br>Looking like this:<br>debugfs: stat lossless.from_folks<br> Inode: 258749 Mode: 0777 Generation: 667446219 (0x27c86bcb)<br> FS Generation: 781535612 (0x2e95497c)<br> Type: Symbolic Link Attr: 0x0 Flags: Valid<br>
User: 1000 (khaije) Group: 1000 (khaije) Size: 68<br> Links: 1 Clusters: 0<br> ctime: 0x49f5b25d -- Mon Apr 27 09:25:49 2009<br> atime: 0x49f5b25d -- Mon Apr 27 09:25:49 2009<br> mtime: 0x4994683e -- Thu Feb 12 13:19:42 2009<br>
dtime: 0x0 -- Wed Dec 31 19:00:00 1969<br> ctime_nsec: 0x0a0f5a36 -- 168778294<br> atime_nsec: 0x00000000 -- 0<br> mtime_nsec: 0x00000000 -- 0<br> Last Extblk: 0<br> Sub Alloc Slot: 0 Sub Alloc Bit: 699<br>
Fast Symlink Destination: /home/khaije/documents/shared_multimedia/audio/music/lossy.from_folks<br><br><br>I'm guessing this either added enough directories to the path to exceed rdump's threshold or that it had problems manipulating the combination of local and non-local filesystems. I would simply delete it but debugfs.ocfs2 doesn't seem to allow that. (I'm not sure why I didn't just use a relative path symlink since the target is in the same directory)<br>
<br>Anyway by avoiding that symlink I was able to make a full recovery.<br><br>Cheers and thanks again,<br><br><br><div class="gmail_quote">On Fri, May 29, 2009 at 2:18 PM, Sunil Mushran <span dir="ltr"><<a href="mailto:sunil.mushran@oracle.com">sunil.mushran@oracle.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">You are using ocfs2 atop lvm - a non-cluster-aware volume manager.<br>
A lot of things can go wrong in this combination. Quite a few have<br>
been reported on this forum.<br>
<br>
debugfs.ocfs2 has commands dump and rdump that allows users to<br>
read the files directly off the disk. Use it to recover your data.<br>
<br>
khaije rock wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="im">
I can simplify this question:<br>
<br>
What can I do to try to recover data from a problematic ocfs2 filesystem?<br>
<br>
For example, would I get any traction if I build tools from upstream sources?<br>
<br>
Thanks all!<br>
<br>
---------- Forwarded message ----------<br></div><div class="im">
From: *khaije rock* <<a href="mailto:khaije1@gmail.com" target="_blank">khaije1@gmail.com</a> <mailto:<a href="mailto:khaije1@gmail.com" target="_blank">khaije1@gmail.com</a>>><br>
Date: Mon, May 25, 2009 at 8:06 AM<br>
Subject: fsck fails & volume mount fails, is my data lost?<br></div><div><div></div><div class="h5">
To: <a href="mailto:ocfs2-users@oss.oracle.com" target="_blank">ocfs2-users@oss.oracle.com</a> <mailto:<a href="mailto:ocfs2-users@oss.oracle.com" target="_blank">ocfs2-users@oss.oracle.com</a>><br>
<br>
<br>
Hi,<br>
<br>
I hope its appropriate for me to post my issue to this list. Thanks in advance for any help!<br>
<br>
I don't know exactly what the underlying cause is, but here is what it looks like:<br>
- mount the filesystem<br>
- cd into the directory with no errors, however<br>
- the shell seizes when i attempt to 'ls' or interact with any data in any way.<br>
<br>
I've found when running fsck.ocfs2 against the block device (it's a logical volume using lvm) it completes successfully and reports the following:<br>
<br>
khaije@chronovore:~$ sudo fsck /dev/vg.chronovore/lv.medea.share._multimedia_store<br>
fsck 1.41.3 (12-Oct-2008)<br>
Checking OCFS2 filesystem in /dev/vg.chronovore/lv.medea.share._multimedia_store:<br>
label: lv.medea.share._multimedia_store<br>
uuid: 28 f3 65 1c 1d 04 4e 28 af f0 37 7f 30 13 fc 38<br>
number of blocks: 65536000<br>
bytes per block: 4096<br>
number of clusters: 65536000<br>
bytes per cluster: 4096<br>
max slots: 4<br>
<br>
o2fsck_should_replay_journals:564 | slot 0 JOURNAL_DIRTY_FL: 1<br>
o2fsck_should_replay_journals:564 | slot 1 JOURNAL_DIRTY_FL: 0<br>
o2fsck_should_replay_journals:564 | slot 2 JOURNAL_DIRTY_FL: 0<br>
o2fsck_should_replay_journals:564 | slot 3 JOURNAL_DIRTY_FL: 0<br>
<br>
/dev/vg.chronovore/lv.medea.share._multimedia_store is clean. It will be checked after 20 additional mounts.<br>
<br>
<br>
The command returns this output and returns control to the shell. As you can see it indicates there is a 'journal dirty' flag set for slot one, which is the host machine. You'll notice that immediately after stating that the journal is dirty it says that the filesystem is clean.<br>
<br>
In order to try to make the filesystem usable I ran fsck.ocfs2 with the -fvv flags. This process never fully completes. After several minutes of the process happily chugging along it seizes. One of the last blocks of output generated has this to say:<br>
<br>
o2fsck_verify_inode_fields:435 | checking inode 14119181's fields<br>
check_el:249 | depth 0 count 243 next_free 1<br>
check_er:164 | cpos 0 clusters 1 blkno 14677109<br>
verify_block:705 | adding dir block 14677109<br>
update_inode_alloc:157 | updated inode 14119181 alloc to 1 from 1 in slot 0<br>
o2fsck_verify_inode_fields:435 | checking inode 14119182's fields<br>
check_el:249 | depth 0 count 243 next_free 1<br>
check_er:164 | cpos 0 clusters 1 blkno 14677110<br>
o2fsck_mark_cluster_allocated: Internal logic faliure !! duplicate cluster 14677110<br>
verify_block:705 | adding dir block 14677110<br>
<br>
This 'Internal logic failure' seems significant, so I googled and found the following passage (<a href="http://oss.oracle.com/osswiki/OCFS2/DesignDocs/RemoveSlotsTunefs" target="_blank">http://oss.oracle.com/osswiki/OCFS2/DesignDocs/RemoveSlotsTunefs</a>) which seems to have some bearing in my case:<br>
<br>
-=-=-=-=-=-<br>
Duplicate groups or missing groups<br>
<br>
When we relink the groups in extent_alloc and inode_alloc, it contains 2 steps, deleting from the old inode and relinking to the new inode. So which should be carried first since we may panic between the two steps.<br>
<br>
Deleting from the old inode first If deletion is carried first and tunefs panic: Since fsck.ocfs2 don't know the inode and extent blocks are allocated(it decide them by reading inode_alloc and extent_alloc), all the spaces will be freed. This is too bad.<br>
<br>
Relinking to the new inode first If relink is carried first, and tunefs panic: Since now two alloc inode contains some duplicated chains, error "GROUP_PARENT" is prompted every time and many internal error "o2fsck_mark_cluster_allocated: Internal logic failure !! duplicate cluster".<br>
Although this is also boring, we at least have the chain information in our hand, so I'd like to revise fsck.ocfs2 to be fit for this scenario. There are also one thing that has to be mentioned: fsck.ocfs2 will loop forever in o2fsck_add_dir_block since it doesn't handle the condition of dbe->e_blkno == tmp_dbe->e_blkno, so we have to handle this also.<br>
=-=-=-=-=-<br>
<br>
Later in this page the author suggests that fsck.ocfs2 would need to be modified to handle this case (which I gather hasn't happened yet), however there must be some other way to remedy this situation and recover the nearly 250 gigs of data i have on this share?<br>
<br>
Can anyone help?<br>
<br>
I've tried copying to a new partition by using debugfs.ocfs2 but I'm not sure if I'm doing it right or if there is a more sensible approach to try.<br>
<br>
Thanks all,<br>
Nick<br>
<br>
<br>
<br></div></div>
------------------------------------------------------------------------<br>
<br>
_______________________________________________<br>
Ocfs2-users mailing list<br>
<a href="mailto:Ocfs2-users@oss.oracle.com" target="_blank">Ocfs2-users@oss.oracle.com</a><br>
<a href="http://oss.oracle.com/mailman/listinfo/ocfs2-users" target="_blank">http://oss.oracle.com/mailman/listinfo/ocfs2-users</a><br>
</blockquote>
<br>
</blockquote></div><br>