[Ocfs2-users] fsck hangs in Pass 0a

Matthias Bernges matthiasbernges at gmx.de
Tue Aug 9 23:41:12 PDT 2011


Hello list,

I have a ~10TB ocfs2 filesystem in a 8-node cluster. This sits on a
logical volume (I know lv is not cluster aware, but I make sure no one
touches the lv, while the cluster is running). The LV consists of 5x2TB
multipath devices.

I recently had errors like this on some nodes:

OCFS2: ERROR (device dm-7): ocfs2_check_group_descriptor: Group Descriptor # 0 has bad signature 
File system is now read-only due to the potential of on-disk corruption. Please run fsck.ocfs2 once the file system is unmounted.
(kvm,12322,1):ocfs2_search_chain:1363 ERROR: status = -5
(kvm,12322,1):ocfs2_claim_suballoc_bits:1524 ERROR: status = -5
(kvm,12322,1):__ocfs2_claim_clusters:1806 ERROR: status = -5
(kvm,12322,1):ocfs2_local_alloc_new_window:1013 ERROR: status = -5
(kvm,12322,1):ocfs2_local_alloc_slide_window:1116 ERROR: status = -5
(kvm,12322,1):ocfs2_reserve_local_alloc_bits:537 ERROR: status = -5
(kvm,12322,1):__ocfs2_reserve_clusters:816 ERROR: status = -5
(kvm,12322,1):ocfs2_lock_allocators:677 ERROR: status = -5
(kvm,12322,1):ocfs2_write_begin_nolock:1750 ERROR: status = -5
(kvm,12322,1):ocfs2_write_begin:1860 ERROR: status = -5
(kvm,12322,1):ocfs2_file_buffered_write:2039 ERROR: status = -5
OCFS2: ERROR (device dm-7): ocfs2_check_group_descriptor: Group Descriptor # 0 has bad signature 


So I ran fsck.ocfs2 -f. But it hangs forever (>12h) with this output:

fsck.ocfs2 1.4.4
Checking OCFS2 filesystem in /dev/mapper/lv0:
  Label:              <NONE>
  UUID:               F27D7B8F7127436981A2B5D1C93FB204
  Number of blocks:   2684349440
  Block size:         4096
  Number of clusters: 2684349440
  Cluster size:       4096
  Number of slots:    16

/dev/mapper/lv0 was run with -f, check forced.
Pass 0a: Checking cluster allocation chains


I attaced strace to it, to see what is going on.
Before it hangs I get:

write(1, "Pass 0a: Checking cluster alloca"..., 44Pass 0a: Checking cluster allocation chains) = 44
mmap(NULL, 4198400, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2ad6a5001000
munmap(0x2ad6a5001000, 4198400)         = 0
mmap(NULL, 4202496, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2ad6a5001000
pread(3, "INODE01\0;Q&\354\377\377\7\0\0\0\0\0\0\354\377\237\0\0\0\0\0\0\0\0"..., 4096, 45056) = 4096
mprotect(0x2ad640020000, 4096, PROT_READ|PROT_WRITE) = 0
mprotect(0x2ad640021000, 4096, PROT_READ|PROT_WRITE) = 0
...
[a couple of hundred similar lines]
...
mprotect(0x2ad640803000, 4096, PROT_READ|PROT_WRITE) = 0


Then it hangs with 100% idle on one core.

Regards,
Matthias	




More information about the Ocfs2-users mailing list