[Ocfs2-users] ocfs2 kernel BUG

Fri Aug 1 05:38:10 PDT 2008

* Tao Ma <tao.ma at oracle.com> [01.08.08 10:58]
Hi,

hanks for your quick reply.

Here the details:

xxx:/ # SPident 

CONCLUSION: System is up-to-date!
  found    SLE-10-i386-SP1 + "online updates"

xxx:/ # uname -r
2.6.16.46-0.12-bigsmp

xxx:/ # cat /proc/fs/ocfs2/version 
OCFS2 1.2.5-SLES-r2997 Tue Mar 27 16:33:19 EDT 2007 (build sles)

xxx:/ # debugfs.ocfs2 -V
debugfs.ocfs2 1.2.3

We have 6 nodes in the cluster and the described behavior (freeze of
processes in a certain directory) was observed on all 6 nodes. Thanks.

> Hi,
> 	Please provide the detail info of ocfs2 version which may be helpful for 
> diagnose.
> 
> Peter Selzner wrote:
> >Hi,
> >we had this entries in /var/log/messeges a few days ago:
> >Jul 28 23:30:47 xxx kernel: (12268,2):ocfs2_extend_file:790 ERROR: bug 
> >expression: i_size_read(inode) != (le64_to_cpu(fe->i_size) - *bytes_extended)
> >Jul 28 23:30:47 xxx kernel: (12268,2):ocfs2_extend_file:790 ERROR: Inode 
> >8323098 i_size = 1572864, dinode i_size = 1568768, bytes_extended = 0, 
> >new_i_size = 1576960 Jul 28 23:30:47 xxx kernel: klogd 1.4.1, ---------- state 
> >change ---------- Jul 28 23:30:47 xxx kernel: ------------[ cut here 
> >]------------
> >Jul 28 23:30:47 xxx kernel: kernel BUG at fs/ocfs2/file.c:790!
> >Jul 28 23:30:47 xxx kernel: invalid opcode: 0000 [#1]
> >Jul 28 23:30:47 xxx kernel: SMP Jul 28 23:30:47 xxx kernel: last sysfs file: 
> >/class/infiniband/mthca1/board_id
> >Jul 28 23:30:47 xxx kernel: Modules linked in: ocfs2 ocfs2_dlmfs ocfs2_dlm 
> >ocfs2_nodemanager configfs cpqci mptctl mptbase ipmi_si ipmi_devintf 
> >ipmi_msghandler rdma_ucm rds ib_ucm ib_sdp rdma_cm iw_cm
> >ib_addr ib_local_sa ib_ipoib ib_cm ib_sa ipv6 ib_uverbs ib_umad bonding 
> >ib_mthca ib_mad ib_core button battery ac raw loop dm_round_robin dm_multipath 
> >dm_mod usbhid hw_random ide_cd uhci_hcd e1000
> >cdrom ehci_hcd bnx2 usbcore ext3 jbd ata_piix ahci libata edd fan thermal 
> >processor cciss sg qla2400 qla2300 qla2xxx firmware_class qla2xxx_conf 
> >intermodule piix sd_mod scsi_mod ide_disk ide_core
> >Jul 28 23:30:47 xxx kernel: CPU:    2   Jul 28 23:30:47 xxx kernel: EIP:    
> >0060:[<f9de8173>]    Tainted: P     U VLI Jul 28 23:30:47 xxx kernel: EFLAGS: 
> >00210292   (2.6.16.46-0.12-bigsmp #1) Jul 28 23:30:47 xxx kernel: EIP is at 
> >ocfs2_extend_file+0x3cd/0xf9b [ocfs2]
> >Jul 28 23:30:47 xxx kernel: eax: 0000008c   ebx: 00000000   ecx: ffffff00   
> >edx: 00200286
> >Jul 28 23:30:47 xxx kernel: esi: 00000000   edi: 00000000   ebp: df05f000   
> >esp: e398de70
> >Jul 28 23:30:47 xxx kernel: ds: 007b   es: 007b   ss: 0068
> >Jul 28 23:30:47 xxx kernel: Process mv (pid: 12268, threadinfo=e398c000 
> >task=f7f80660)
> >Jul 28 23:30:47 xxx kernel: Stack: <0>00000000 dd4f9d88 ce48c000 00000000 
> >00000000 00000001 cf253280 dd4f9b80 Jul 28 23:30:47 xxx kernel:        
> >dd4f9ee4 0017f000 00000000 00000000 f9ddf432 e398dea8 dd4f9b80 00000000 Jul 28 
> >23:30:47 xxx kernel:        00000001 e398deb4 e398deb4 ce48c000 00000000 
> >00000000 ece0bc00 00000000 Jul 28 23:30:47 xxx kernel: Call Trace:
> >Jul 28 23:30:47 xxx kernel:  [<f9ddf432>] ocfs2_status_completion_cb+0x0/0xa 
> >[ocfs2]
> >Jul 28 23:30:47 xxx kernel:  [<f9df72f2>] 
> >ocfs2_write_lock_maybe_extend+0xb2f/0xde3 [ocfs2]
> >Jul 28 23:30:47 xxx kernel:  [<f9dea85d>] ocfs2_file_write+0x125/0x24d [ocfs2]
> >Jul 28 23:30:47 xxx kernel:  [<f9dea738>] ocfs2_file_write+0x0/0x24d [ocfs2]
> >Jul 28 23:30:47 xxx kernel:  [<c0164714>] vfs_write+0xaa/0x152
> >Jul 28 23:30:47 xxx kernel:  [<c0164d1f>] sys_write+0x3c/0x63
> >Jul 28 23:30:47 xxx kernel:  [<c0103cab>] sysenter_past_esp+0x54/0x79
> >Jul 28 23:30:47 xxx kernel: Code: 8b 4c 24 3c ff 71 04 ff 31 68 16 03 00 00 68 
> >2b b5 e0 f9 ff 70 10 8b 00 ff b0 c0 00 00 00 68 b1 fd e0 f9 e8 ca a8 33 c6 83 
> >c4 3c <0f> 0b 16 03 db fb e0 f9 8b 5c 24 20
> >8b 03 0f ae e8 89 f6 8b 74 It was impossible to do "ls -al" in a certain 
> >directory (each process that
> >"touched" files in this directory ends in DEAD state (uninterruptible sleep).
> >Any suggestions? Thanks.
> How do this happen and could you please explain it in more detail? e.g, how 
> many nodes are in your cluster? you hang in one node, how about other nodes or 
> what you are doing in other nodes.
> 
> Regards,
> Tao

Mit freundlichen Gruessen
Peter Selzner

--
| Peter Selzner				mail:	p.selzner at krz.de |
| Kommunales Rechenzentrum (KRZ)	tel: +49 (0)5261-252-273 |
| Minden-Ravensberg / Lippe		fax: +49 (0)5261-932-273 |