[Ocfs2-users] ocfs2 kernel BUG
Peter Selzner
p.selzner at KRZ.DE
Fri Aug 1 05:38:10 PDT 2008
* Tao Ma <tao.ma at oracle.com> [01.08.08 10:58]
Hi,
hanks for your quick reply.
Here the details:
xxx:/ # SPident
CONCLUSION: System is up-to-date!
found SLE-10-i386-SP1 + "online updates"
xxx:/ # uname -r
2.6.16.46-0.12-bigsmp
xxx:/ # cat /proc/fs/ocfs2/version
OCFS2 1.2.5-SLES-r2997 Tue Mar 27 16:33:19 EDT 2007 (build sles)
xxx:/ # debugfs.ocfs2 -V
debugfs.ocfs2 1.2.3
We have 6 nodes in the cluster and the described behavior (freeze of
processes in a certain directory) was observed on all 6 nodes. Thanks.
> Hi,
> Please provide the detail info of ocfs2 version which may be helpful for
> diagnose.
>
> Peter Selzner wrote:
> >Hi,
> >we had this entries in /var/log/messeges a few days ago:
> >Jul 28 23:30:47 xxx kernel: (12268,2):ocfs2_extend_file:790 ERROR: bug
> >expression: i_size_read(inode) != (le64_to_cpu(fe->i_size) - *bytes_extended)
> >Jul 28 23:30:47 xxx kernel: (12268,2):ocfs2_extend_file:790 ERROR: Inode
> >8323098 i_size = 1572864, dinode i_size = 1568768, bytes_extended = 0,
> >new_i_size = 1576960 Jul 28 23:30:47 xxx kernel: klogd 1.4.1, ---------- state
> >change ---------- Jul 28 23:30:47 xxx kernel: ------------[ cut here
> >]------------
> >Jul 28 23:30:47 xxx kernel: kernel BUG at fs/ocfs2/file.c:790!
> >Jul 28 23:30:47 xxx kernel: invalid opcode: 0000 [#1]
> >Jul 28 23:30:47 xxx kernel: SMP Jul 28 23:30:47 xxx kernel: last sysfs file:
> >/class/infiniband/mthca1/board_id
> >Jul 28 23:30:47 xxx kernel: Modules linked in: ocfs2 ocfs2_dlmfs ocfs2_dlm
> >ocfs2_nodemanager configfs cpqci mptctl mptbase ipmi_si ipmi_devintf
> >ipmi_msghandler rdma_ucm rds ib_ucm ib_sdp rdma_cm iw_cm
> >ib_addr ib_local_sa ib_ipoib ib_cm ib_sa ipv6 ib_uverbs ib_umad bonding
> >ib_mthca ib_mad ib_core button battery ac raw loop dm_round_robin dm_multipath
> >dm_mod usbhid hw_random ide_cd uhci_hcd e1000
> >cdrom ehci_hcd bnx2 usbcore ext3 jbd ata_piix ahci libata edd fan thermal
> >processor cciss sg qla2400 qla2300 qla2xxx firmware_class qla2xxx_conf
> >intermodule piix sd_mod scsi_mod ide_disk ide_core
> >Jul 28 23:30:47 xxx kernel: CPU: 2 Jul 28 23:30:47 xxx kernel: EIP:
> >0060:[<f9de8173>] Tainted: P U VLI Jul 28 23:30:47 xxx kernel: EFLAGS:
> >00210292 (2.6.16.46-0.12-bigsmp #1) Jul 28 23:30:47 xxx kernel: EIP is at
> >ocfs2_extend_file+0x3cd/0xf9b [ocfs2]
> >Jul 28 23:30:47 xxx kernel: eax: 0000008c ebx: 00000000 ecx: ffffff00
> >edx: 00200286
> >Jul 28 23:30:47 xxx kernel: esi: 00000000 edi: 00000000 ebp: df05f000
> >esp: e398de70
> >Jul 28 23:30:47 xxx kernel: ds: 007b es: 007b ss: 0068
> >Jul 28 23:30:47 xxx kernel: Process mv (pid: 12268, threadinfo=e398c000
> >task=f7f80660)
> >Jul 28 23:30:47 xxx kernel: Stack: <0>00000000 dd4f9d88 ce48c000 00000000
> >00000000 00000001 cf253280 dd4f9b80 Jul 28 23:30:47 xxx kernel:
> >dd4f9ee4 0017f000 00000000 00000000 f9ddf432 e398dea8 dd4f9b80 00000000 Jul 28
> >23:30:47 xxx kernel: 00000001 e398deb4 e398deb4 ce48c000 00000000
> >00000000 ece0bc00 00000000 Jul 28 23:30:47 xxx kernel: Call Trace:
> >Jul 28 23:30:47 xxx kernel: [<f9ddf432>] ocfs2_status_completion_cb+0x0/0xa
> >[ocfs2]
> >Jul 28 23:30:47 xxx kernel: [<f9df72f2>]
> >ocfs2_write_lock_maybe_extend+0xb2f/0xde3 [ocfs2]
> >Jul 28 23:30:47 xxx kernel: [<f9dea85d>] ocfs2_file_write+0x125/0x24d [ocfs2]
> >Jul 28 23:30:47 xxx kernel: [<f9dea738>] ocfs2_file_write+0x0/0x24d [ocfs2]
> >Jul 28 23:30:47 xxx kernel: [<c0164714>] vfs_write+0xaa/0x152
> >Jul 28 23:30:47 xxx kernel: [<c0164d1f>] sys_write+0x3c/0x63
> >Jul 28 23:30:47 xxx kernel: [<c0103cab>] sysenter_past_esp+0x54/0x79
> >Jul 28 23:30:47 xxx kernel: Code: 8b 4c 24 3c ff 71 04 ff 31 68 16 03 00 00 68
> >2b b5 e0 f9 ff 70 10 8b 00 ff b0 c0 00 00 00 68 b1 fd e0 f9 e8 ca a8 33 c6 83
> >c4 3c <0f> 0b 16 03 db fb e0 f9 8b 5c 24 20
> >8b 03 0f ae e8 89 f6 8b 74 It was impossible to do "ls -al" in a certain
> >directory (each process that
> >"touched" files in this directory ends in DEAD state (uninterruptible sleep).
> >Any suggestions? Thanks.
> How do this happen and could you please explain it in more detail? e.g, how
> many nodes are in your cluster? you hang in one node, how about other nodes or
> what you are doing in other nodes.
>
> Regards,
> Tao
Mit freundlichen Gruessen
Peter Selzner
--
| Peter Selzner mail: p.selzner at krz.de |
| Kommunales Rechenzentrum (KRZ) tel: +49 (0)5261-252-273 |
| Minden-Ravensberg / Lippe fax: +49 (0)5261-932-273 |
More information about the Ocfs2-users
mailing list