[Ocfs2-users] ocfs dmesg and fsck errors

Sunil Mushran sunil.mushran at oracle.com
Mon Jan 5 18:10:59 PST 2009


A directory is corrupted. To get the name of the dir, do:

$ debugfs.ocfs2 -R "findpath <12862125>" /dev/sdX

It may take time as it will have to traverse the dirs.

To fix, you will have to run fsck in rw mode. fsck.ocfs2 -fy /dev/sdX.

That the nodes do not crash is as expected. Dir corruption is always localized
and only rears up as an error.

800K files in one dir is not efficient since the current version does not
support indexed dirs. We hope to add support for the same in the near term.

Sunil


Bob Ziuchkovski wrote:
> Hi All,
>
> I'm trying to move my company from our current frenzy of rsyncing 
> towards ocfs2.  I've deployed ocfs2 on a few test servers and in general 
> things seem to be working.  However, I've run into a couple of problems 
> and wanted to run them by this mailing list.
>
> Earlier today I encountered errors that at first appeared to be 
> permission errors.  However, when I checked dmesg output, I saw the 
> following entries repeated over and over:
>
> (20830,0):ocfs2_mknod:351 ERROR: status = -2
> (20830,0):ocfs2_check_dir_entry:1727 ERROR: bad entry in directory 
> #12862125: rec_len is smaller than minimal - offset=258867
> 2, inode=1099511657728, rec_len=0, name_len=0
>
> I'm not exactly sure what this means.  Is there a way for me to 
> determine the path to the directory and/or file referenced above?
>
> Since this seemed like it might be fs corruption, I ran the fsck.ocfs2 
> utility, but in read-only mode.  I ended up with output that looks like 
> the following:
>
> Pass 0a: Checking cluster allocation chains
> Pass 0b: Checking inode allocation chains
> Pass 0c: Checking extent block allocation chains
> Pass 1: Checking inodes and blocks.
> o2fsck_mark_cluster_allocated: Internal logic faliure !! duplicate 
> cluster 4387633^M
> o2fsck_mark_cluster_allocated: Internal logic faliure !! duplicate 
> cluster 4387634^M
> o2fsck_mark_cluster_allocated: Internal logic faliure !! duplicate 
> cluster 4387635^M
> o2fsck_mark_cluster_allocated: Internal logic faliure !! duplicate 
> cluster 4387636^M
> <-----------------SNIP Similar-------------------->
> Pass 2: Checking directory entries.
> [DIRENT_LENGTH] Directory inode 12862125 corrupted in logical block 632 
> physical block 35102672 offset 0. Attempt to repair this block's 
> directory entries? n
> [DIRENT_LENGTH] Directory inode 12862125 corrupted in logical block 633 
> physical block 35102673 offset 0. Attempt to repair this block's 
> directory entries? n
> [DIRENT_LENGTH] Directory inode 12862125 corrupted in logical block 634 
> physical block 35102674 offset 0. Attempt to repair this block's 
> directory entries? n
> [DIRENT_LENGTH] Directory inode 12862125 corrupted in logical block 635 
> physical block 35102675 offset 0. Attempt to repair this block's 
> directory entries? n
> [DIRENT_LENGTH] Directory inode 12862125 corrupted in logical block 636 
> physical block 35102676 offset 0. Attempt to repair this block's 
> directory entries? n
> [DIRENT_LENGTH] Directory inode 12862125 corrupted in logical block 637 
> physical block 35102677 offset 0. Attempt to repair this block's 
> directory entries? n
> [DIRENT_LENGTH] Directory inode 12862125 corrupted in logical block 638 
> physical block 35102678 offset 0. Attempt to repair this block's 
> directory entries? n
> [DIRENT_LENGTH] Directory inode 12862125 corrupted in logical block 639 
> physical block 35102679 offset 0. Attempt to repair this block's 
> directory entries? n
> Pass 3: Checking directory connectivity.
> Pass 4a: checking for orphaned inodes
> Pass 4b: Checking inodes link counts.
> [INODE_NOT_CONNECTED] Inode 0 isn't referenced by any directory entries. 
>   Move it to lost+found? n
> [INODE_COUNT] Inode 31231457 has a link count of 1 on disk but directory 
> entry references come to 0. Update the count on disk
>   to match? n
> [INODE_NOT_CONNECTED] Inode 31231457 isn't referenced by any directory 
> entries.  Move it to lost+found? n
> [INODE_COUNT] Inode 31231458 has a link count of 1 on disk but directory 
> entry references come to 0. Update the count on disk
>   to match? n
> [INODE_NOT_CONNECTED] Inode 31231458 isn't referenced by any directory 
> entries.  Move it to lost+found? n
> [INODE_COUNT] Inode 31231459 has a link count of 1 on disk but directory 
> entry references come to 0. Update the count on disk
>   to match? n
> [INODE_NOT_CONNECTED] Inode 31231459 isn't referenced by any directory 
> entries.  Move it to lost+found? n
> [INODE_COUNT] Inode 31231460 has a link count of 1 on disk but directory 
> entry references come to 0. Update the count on disk
>   to match? n
> [INODE_NOT_CONNECTED] Inode 31231460 isn't referenced by any directory 
> entries.  Move it to lost+found? n
> [INODE_COUNT] Inode 31231461 has a link count of 1 on disk but directory 
> entry references come to 0. Update the count on disk
>   to match? n
> <-----------------SNIP Similar----------------------->
>
> As far as I know, none of the nodes that are running ocfs2 have actually 
> crashed and I created the filesystems just last week.  One thing I 
> should mention, though, is that the filesystem in question has about 3.3 
> million small files, 800k of which are contained within a single flat 
> directory -- I know, it's terrible...I've inherited this mess from 
> previous admins.   Additionally, I rsync'ed the files to the ocfs2 
> volume from one of our existing servers.  I have never been able to fsck 
> the filesystem of the existing server without errors, but fixing the 
> errors generally leads to a bunch of the small files being unlinked and 
> moved to lost+found.  My impression is that rsync reads things at a 
> high-enough level that this shouldn't duplicate filesystem errors on a 
> target volume, but maybe I'm wrong.
>
> Anyway, any help that could be offered would be greatly appreciated. 
> I'm really trying to fix the filesystem mess I've inherited, but I get 
> the impression it will be an arduous task.  :)  In terms of package 
> information, we're running RHEL 4u7 on x86_64 with the following packges 
> installed: ocfs2-2.6.9-55.0.9.ELsmp-1.2.9-1.el4 and 
> ocfs2-tools-1.2.7-1.el4.  Thanks!
>
> Bob Ziuchkovski
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>   




More information about the Ocfs2-users mailing list