[Ocfs2-users] Fwd: fsck fails & volume mount fails, is my data lost?

Fri May 29 11:18:35 PDT 2009

You are using ocfs2 atop lvm - a non-cluster-aware volume manager.
A lot of things can go wrong in this combination. Quite a few have
been reported on this forum.

debugfs.ocfs2 has commands dump and rdump that allows users to
read the files directly off the disk. Use it to recover your data.

khaije rock wrote:
> I can simplify this question:
>
> What can I do to try to recover data from a problematic ocfs2 filesystem?
>
> For example, would I get any traction if I build tools from upstream 
> sources?
>
> Thanks all!
>
> ---------- Forwarded message ----------
> From: *khaije rock* <khaije1 at gmail.com <mailto:khaije1 at gmail.com>>
> Date: Mon, May 25, 2009 at 8:06 AM
> Subject: fsck fails & volume mount fails, is my data lost?
> To: ocfs2-users at oss.oracle.com <mailto:ocfs2-users at oss.oracle.com>
>
>
> Hi,
>
> I hope its appropriate for me to post my issue to this list. Thanks in 
> advance for any help!
>
> I don't know exactly what the underlying cause is, but here is what it 
> looks like:
>  - mount the filesystem
>  - cd into the directory with no errors, however
>  - the shell seizes when i attempt to 'ls' or interact with any data 
> in any way.
>
> I've found when running fsck.ocfs2 against the block device (it's a 
> logical volume using lvm) it completes successfully and reports the 
> following:
>
> khaije at chronovore:~$ sudo fsck 
> /dev/vg.chronovore/lv.medea.share._multimedia_store
> fsck 1.41.3 (12-Oct-2008)
> Checking OCFS2 filesystem in 
> /dev/vg.chronovore/lv.medea.share._multimedia_store:
>   label:              lv.medea.share._multimedia_store
>   uuid:               28 f3 65 1c 1d 04 4e 28 af f0 37 7f 30 13 fc 38
>   number of blocks:   65536000
>   bytes per block:    4096
>   number of clusters: 65536000
>   bytes per cluster:  4096
>   max slots:          4
>
> o2fsck_should_replay_journals:564 | slot 0 JOURNAL_DIRTY_FL: 1
> o2fsck_should_replay_journals:564 | slot 1 JOURNAL_DIRTY_FL: 0
> o2fsck_should_replay_journals:564 | slot 2 JOURNAL_DIRTY_FL: 0
> o2fsck_should_replay_journals:564 | slot 3 JOURNAL_DIRTY_FL: 0
>
> /dev/vg.chronovore/lv.medea.share._multimedia_store is clean.  It will 
> be checked after 20 additional mounts.
>
>
> The command returns this output and returns control to the shell. As 
> you can see it indicates there is a 'journal dirty' flag set for slot 
> one, which is the host machine. You'll notice that immediately after 
> stating that the journal is dirty it says that the filesystem is clean.
>
> In order to try to make the filesystem usable I ran fsck.ocfs2 with 
> the -fvv flags. This process never fully completes. After several 
> minutes of the process happily chugging along it seizes. One of the 
> last blocks of output generated has this to say:
>
> o2fsck_verify_inode_fields:435 | checking inode 14119181's fields
> check_el:249 | depth 0 count 243 next_free 1
> check_er:164 | cpos 0 clusters 1 blkno 14677109
> verify_block:705 | adding dir block 14677109
> update_inode_alloc:157 | updated inode 14119181 alloc to 1 from 1 in 
> slot 0
> o2fsck_verify_inode_fields:435 | checking inode 14119182's fields
> check_el:249 | depth 0 count 243 next_free 1
> check_er:164 | cpos 0 clusters 1 blkno 14677110
> o2fsck_mark_cluster_allocated: Internal logic faliure !! duplicate 
> cluster 14677110
> verify_block:705 | adding dir block 14677110
>
> This 'Internal logic failure' seems significant, so I googled and 
> found the following passage 
> (http://oss.oracle.com/osswiki/OCFS2/DesignDocs/RemoveSlotsTunefs) 
> which seems to have some bearing in my case:
>
> -=-=-=-=-=-
> Duplicate groups or missing groups
>
> When we relink the groups in extent_alloc and inode_alloc, it contains 
> 2 steps, deleting from the old inode and relinking to the new inode. 
> So which should be carried first since we may panic between the two steps.
>
>       Deleting from the old inode first If deletion is carried first 
> and tunefs panic: Since fsck.ocfs2 don't know the inode and extent 
> blocks are allocated(it decide them by reading inode_alloc and 
> extent_alloc), all the spaces will be freed. This is too bad.
>
>       Relinking to the new inode first If relink is carried first, and 
> tunefs panic: Since now two alloc inode contains some duplicated 
> chains, error "GROUP_PARENT" is prompted every time and many internal 
> error "o2fsck_mark_cluster_allocated: Internal logic failure !! 
> duplicate cluster".
> Although this is also boring, we at least have the chain information 
> in our hand, so I'd like to revise fsck.ocfs2 to be fit for this 
> scenario. There are also one thing that has to be mentioned: 
> fsck.ocfs2 will loop forever in o2fsck_add_dir_block since it doesn't 
> handle the condition of dbe->e_blkno == tmp_dbe->e_blkno, so we have 
> to handle this also.
> =-=-=-=-=-
>
> Later in this page the author suggests that fsck.ocfs2 would need to 
> be modified to handle this case (which I gather hasn't happened yet), 
> however there must be some other way to remedy this situation and 
> recover the nearly 250 gigs of data i have on this share?
>
> Can anyone help?
>
> I've tried copying to a new partition by using debugfs.ocfs2 but I'm 
> not sure if I'm doing it right or if there is a more sensible approach 
> to try.
>
> Thanks all,
> Nick
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users