[Ocfs2-users] ocfs2 kernel BUG

Fri Aug 1 08:35:41 PDT 2008

No, the kernel is old. A year+ old.

Refer to this announcement below.
http://oss.oracle.com/pipermail/ocfs2-announce/2008-July/000026.html

 From the stack, it looks you are encountering the rename/extend race
that was fixed a long time ago.
http://oss.oracle.com/projects/ocfs2/news/article_14.html

Peter Selzner wrote:
> * Tao Ma <tao.ma at oracle.com> [01.08.08 10:58]
> Hi,
>
> hanks for your quick reply.
>
> Here the details:
>
> xxx:/ # SPident 
>
> CONCLUSION: System is up-to-date!
>   found    SLE-10-i386-SP1 + "online updates"
>
> xxx:/ # uname -r
> 2.6.16.46-0.12-bigsmp
>
> xxx:/ # cat /proc/fs/ocfs2/version 
> OCFS2 1.2.5-SLES-r2997 Tue Mar 27 16:33:19 EDT 2007 (build sles)
>
> xxx:/ # debugfs.ocfs2 -V
> debugfs.ocfs2 1.2.3
>
> We have 6 nodes in the cluster and the described behavior (freeze of
> processes in a certain directory) was observed on all 6 nodes. Thanks.
>
>   
>> Hi,
>> 	Please provide the detail info of ocfs2 version which may be helpful for 
>> diagnose.
>>
>> Peter Selzner wrote:
>>     
>>> Hi,
>>> we had this entries in /var/log/messeges a few days ago:
>>> Jul 28 23:30:47 xxx kernel: (12268,2):ocfs2_extend_file:790 ERROR: bug 
>>> expression: i_size_read(inode) != (le64_to_cpu(fe->i_size) - *bytes_extended)
>>> Jul 28 23:30:47 xxx kernel: (12268,2):ocfs2_extend_file:790 ERROR: Inode 
>>> 8323098 i_size = 1572864, dinode i_size = 1568768, bytes_extended = 0, 
>>> new_i_size = 1576960 Jul 28 23:30:47 xxx kernel: klogd 1.4.1, ---------- state 
>>> change ---------- Jul 28 23:30:47 xxx kernel: ------------[ cut here 
>>> ]------------
>>> Jul 28 23:30:47 xxx kernel: kernel BUG at fs/ocfs2/file.c:790!
>>> Jul 28 23:30:47 xxx kernel: invalid opcode: 0000 [#1]
>>> Jul 28 23:30:47 xxx kernel: SMP Jul 28 23:30:47 xxx kernel: last sysfs file: 
>>> /class/infiniband/mthca1/board_id
>>> Jul 28 23:30:47 xxx kernel: Modules linked in: ocfs2 ocfs2_dlmfs ocfs2_dlm 
>>> ocfs2_nodemanager configfs cpqci mptctl mptbase ipmi_si ipmi_devintf 
>>> ipmi_msghandler rdma_ucm rds ib_ucm ib_sdp rdma_cm iw_cm
>>> ib_addr ib_local_sa ib_ipoib ib_cm ib_sa ipv6 ib_uverbs ib_umad bonding 
>>> ib_mthca ib_mad ib_core button battery ac raw loop dm_round_robin dm_multipath 
>>> dm_mod usbhid hw_random ide_cd uhci_hcd e1000
>>> cdrom ehci_hcd bnx2 usbcore ext3 jbd ata_piix ahci libata edd fan thermal 
>>> processor cciss sg qla2400 qla2300 qla2xxx firmware_class qla2xxx_conf 
>>> intermodule piix sd_mod scsi_mod ide_disk ide_core
>>> Jul 28 23:30:47 xxx kernel: CPU:    2   Jul 28 23:30:47 xxx kernel: EIP:    
>>> 0060:[<f9de8173>]    Tainted: P     U VLI Jul 28 23:30:47 xxx kernel: EFLAGS: 
>>> 00210292   (2.6.16.46-0.12-bigsmp #1) Jul 28 23:30:47 xxx kernel: EIP is at 
>>> ocfs2_extend_file+0x3cd/0xf9b [ocfs2]
>>> Jul 28 23:30:47 xxx kernel: eax: 0000008c   ebx: 00000000   ecx: ffffff00   
>>> edx: 00200286
>>> Jul 28 23:30:47 xxx kernel: esi: 00000000   edi: 00000000   ebp: df05f000   
>>> esp: e398de70
>>> Jul 28 23:30:47 xxx kernel: ds: 007b   es: 007b   ss: 0068
>>> Jul 28 23:30:47 xxx kernel: Process mv (pid: 12268, threadinfo=e398c000 
>>> task=f7f80660)
>>> Jul 28 23:30:47 xxx kernel: Stack: <0>00000000 dd4f9d88 ce48c000 00000000 
>>> 00000000 00000001 cf253280 dd4f9b80 Jul 28 23:30:47 xxx kernel:        
>>> dd4f9ee4 0017f000 00000000 00000000 f9ddf432 e398dea8 dd4f9b80 00000000 Jul 28 
>>> 23:30:47 xxx kernel:        00000001 e398deb4 e398deb4 ce48c000 00000000 
>>> 00000000 ece0bc00 00000000 Jul 28 23:30:47 xxx kernel: Call Trace:
>>> Jul 28 23:30:47 xxx kernel:  [<f9ddf432>] ocfs2_status_completion_cb+0x0/0xa 
>>> [ocfs2]
>>> Jul 28 23:30:47 xxx kernel:  [<f9df72f2>] 
>>> ocfs2_write_lock_maybe_extend+0xb2f/0xde3 [ocfs2]
>>> Jul 28 23:30:47 xxx kernel:  [<f9dea85d>] ocfs2_file_write+0x125/0x24d [ocfs2]
>>> Jul 28 23:30:47 xxx kernel:  [<f9dea738>] ocfs2_file_write+0x0/0x24d [ocfs2]
>>> Jul 28 23:30:47 xxx kernel:  [<c0164714>] vfs_write+0xaa/0x152
>>> Jul 28 23:30:47 xxx kernel:  [<c0164d1f>] sys_write+0x3c/0x63
>>> Jul 28 23:30:47 xxx kernel:  [<c0103cab>] sysenter_past_esp+0x54/0x79
>>> Jul 28 23:30:47 xxx kernel: Code: 8b 4c 24 3c ff 71 04 ff 31 68 16 03 00 00 68 
>>> 2b b5 e0 f9 ff 70 10 8b 00 ff b0 c0 00 00 00 68 b1 fd e0 f9 e8 ca a8 33 c6 83 
>>> c4 3c <0f> 0b 16 03 db fb e0 f9 8b 5c 24 20
>>> 8b 03 0f ae e8 89 f6 8b 74 It was impossible to do "ls -al" in a certain 
>>> directory (each process that
>>> "touched" files in this directory ends in DEAD state (uninterruptible sleep).
>>> Any suggestions? Thanks.
>>>       
>> How do this happen and could you please explain it in more detail? e.g, how 
>> many nodes are in your cluster? you hang in one node, how about other nodes or 
>> what you are doing in other nodes.
>>
>> Regards,
>> Tao
>>     
>
>
> Mit freundlichen Gruessen
> Peter Selzner
>
> --
> | Peter Selzner				mail:	p.selzner at krz.de |
> | Kommunales Rechenzentrum (KRZ)	tel: +49 (0)5261-252-273 |
> | Minden-Ravensberg / Lippe		fax: +49 (0)5261-932-273 |
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>