[Ocfs2-users] merge request for patchset / bug #1324 / issue: ocfs2_read_virt_blocks:853 ERROR: Inode #xxxx contains a hole at offset xxxx

Leen Besselink leen at consolejunkie.net
Thu Dec 20 06:08:26 PST 2012


On Thu, Dec 20, 2012 at 03:05:01PM +0100, Leen Besselink wrote:
> Hi,
> 
> Some time ago I had the following error:
> 
> Dec 10 14:02:50 xxxx kernel: [11099.666180] (31655,6):ocfs2_prepare_dir_for_insert:4415 ERROR: status = -5
> Dec 10 14:02:50 xxxx kernel: [11099.666208] (31655,6):ocfs2_rename:1266 ERROR: status = -5
> Dec 10 14:02:50 xxxx kernel: [11099.692901] (31655,6):ocfs2_read_virt_blocks:853 ERROR: Inode #xxxx contains a hole at offset xxxx
> Dec 10 14:02:50 xxxx kernel: [11099.692952] (31655,6):ocfs2_read_dir_block:533 ERROR: status = -5
> Dec 10 14:02:50 xxxx kernel: [11099.693045] (31655,6):ocfs2_read_virt_blocks:853 ERROR: Inode #xxxx contains a hole at offset xxxx
> Dec 10 14:02:50 xxxx kernel: [11099.693093] (31655,6):ocfs2_read_dir_block:533 ERROR: status = -5
> Dec 10 14:02:50 xxxx kernel: [11099.693186] (31655,6):ocfs2_read_virt_blocks:853 ERROR: Inode #xxxx contains a hole at offset xxxx
> Dec 10 14:02:50 xxxx kernel: [11099.693233] (31655,6):ocfs2_read_dir_block:533 ERROR: status = -5
> 
> It took me a bit of time to figure out what was wrong and what to do and the whole time I had taken the system offline.
> 
> Which was an annoying situation to be in to say the least.
> 
> The reason I diagnosed the problem wrongly at first because only a few days before we had the other well known problem:
> 
> "No space left on device" because of wrongly choosen number of node slots, we reduced it from 8 to 4 on a 2 node cluster.
> 
> I think this was the right solution, we've not seen the issue since.
> 
> Obviously upgrading and enabling discontig-bg is the only long term solution.
> 
> So I had assumed they were related. They were not. As I understand it the holes are in the directory index and the cause is a releted
> to failover and the use of DRBD. I guess it most have been related to a STONITH we had trippped when working on the previous issue.
> 
> Because I didn't know what to do or how to solve it at first, I hoped a fsck would fix it.
> 
> But it didn't. It didn't even find the problem.
> 
> This is because fsck was not only to old, but also because the following patches were never merged:
> 
> https://oss.oracle.com/pipermail/ocfs2-tools-devel/2011-August/003931.html
> 
> Are these patches ever going to be merged ?
> 
> If I read the mailinglists correctly then I guess it is already fixed in newer kernels ? It will just disable the directory index on the fly ?
> 
> But if the patch is merged, it would allow people to upgrade or compile the ocfs2-tools instead of the kernel.
> 
> So I merged the patch by hand and it did recognise the problem, I just didn't want to use a handcrafted fsck to fix a problem if I didn't have to.
> 
> An other problem which caused a lot of delay was that I had never used debugfs extensively before, I've always only looked at 'stats'.
> 
> The problem I had with debugfs is that when you see the help of debugs it says:
> 
> "locate <block#> ...                     List all pathnames of the inode(s)/lockname(s)"
> 
> Which wasn't very clear for me the first time I looked at it.
> 
> I thought it meant:
> 
> locate 12345
> 
> instead of the correct command:
> 
> locale <12345>
> 
> Obviously when I found the debugging FAQ, I knew what to do and I could find out which directory it was. I moved everything to a newly created directory renamed them both and removed the corrupted, not empty directory. I assume that would solve it, even though it was never mentioned explicitly on the mailinglist as a solution.
> 

That should have read:

now empty directory

> So the question remains, are those patches ever going to be merged ?
> 
> Or is my account of the problem now clear enough so people should be able to find this post in the mailinglist archive and fix it themselfs ?
> 
> Have a nice day,
> 	Leen.
> 
> PS Sorry for not mailing this to a ocfs2-tools mailinglist, I only later noticed I had subscribed to the wrong one. I assume the same developers read this list ?



More information about the Ocfs2-users mailing list