[Ocfs2-users] Re: Few panics with OCFSv2, SLES9 Sp3, kernel 282

Alexei_Roudnev Alexei_Roudnev at exigengroup.com
Thu Mar 1 17:30:08 PST 2007


In addition:

File (Inode 842694 ) is log file. IT was written in parallel from 2 nodes
before failure. After system panicked and rebooted (and mounted FS again),
node try to add logs again and it caused one more failure. When I remounted
FS on both nodes, everything became pretty well.

Looks as a bug in syncronization.

----- Original Message ----- 
From: "Alexei_Roudnev" <Alexei_Roudnev at exigengroup.com>
To: <ocfs2-users at oss.oracle.com>
Sent: Thursday, March 01, 2007 5:05 PM
Subject: Few panics with OCFSv2, SLES9 Sp3, kernel 282


Saw it few times until I unmounted FS on all nodes, run fsck (show nothing)
and then mounted back:

Do we have any errors/bugs, explaining this:
Mar  1 06:26:42 spproddoc01-0/spproddoc01-0 kernel:
(6001,1):ocfs2_extend_file:789 ERROR: bug expression: i_size_read(inode) !=
(le64_to_cpu(fe->i_size) - *bytes_extended)
Mar  1 06:26:42 spproddoc01-0/spproddoc01-0 kernel:
(6001,1):ocfs2_extend_file:789 ERROR: Inode 842694 i_size = 1270197, dinode
i_size = 1476996, bytes_extended = 0, new_i_size = 1270198
Mar  1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: ----------- [cut
here ] --------- [please bite here ] ---------
Mar  1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: Kernel BUG at file:789
Mar  1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: invalid operand: 0000
[1] SMP
Mar  1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: CPU 1
Mar  1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: Pid: 6001, comm: perl
Tainted: G   U   (2.6.5-7.282-smp SLES9_SP3_BRANCH-20060829104040)
Mar  1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: RIP:
0010:[<ffffffffa0356bc4>] <ffffffffa0356bc4>{:ocfs2:ocfs2_extend_file+772}
Mar  1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: RSP:
0018:00000100b3f05cd8  EFLAGS: 00010216
Mar  1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: RAX: 000000000000008a
RBX: 000001007b4e7000 RCX: 000000000003ffff
Mar  1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: RDX: 0000000000000000
RSI: 00000000000162e2 RDI: 00000000001361b5
Mar  1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: RBP: 0000000000000000
R08: 0000000000000033 R09: 0000000000000006
Mar  1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: R10: 00000000ffffffff
R11: 0000000000000000 R12: 000001001f6a33d8
Mar  1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: R13: 000001010b683e80
R14: 000001001f6a33d8 R15: 000001013940e000
Mar  1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: FS:
0000002a95d3d6e0(0000) GS:ffffffff8057e600(0000) knlGS:0000000000000000
Mar  1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: CS:  0010 DS: 0000 ES:
0000 CR0: 000000008005003b
Mar  1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: CR2: 0000000001043928
CR3: 00000000bff04000 CR4: 00000000000006e0
Mar  1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: Process perl (pid: 6001,
threadinfo 00000100b3f04000, task 000001001c16d620)
Mar  1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: Stack: 00000000001361b5
0000000000168984 0000000000000000 00000000001361b6
Mar  1 06:26:42 spproddoc01-0/spproddoc01-0 kernel:        0000000000000000
0000000000000000 00000100b3f05dd0 00000000001361b6
Mar  1 06:26:42 spproddoc01-0/spproddoc01-0 kernel:        0000000000000216
0000000000000000
Mar  1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: Call
Trace:<ffffffffa0364538>{:ocfs2:ocfs2_lock_buffer_inodes+536}
Mar  1 06:26:42 spproddoc01-0/spproddoc01-0 kernel:
<ffffffffa03657f5>{:ocfs2:ocfs2_write_lock_maybe_extend+2517}
Mar  1 06:26:42 spproddoc01-0/spproddoc01-0 kernel:
<ffffffffa035562e>{:ocfs2:ocfs2_file_write+414}
<ffffffff80197d9c>{vfs_fstat+204}
Mar  1 06:26:42 spproddoc01-0/spproddoc01-0 kernel:
<ffffffff8018d734>{vfs_write+244} <ffffffff8018d98d>{sys_write+157}
Mar  1 06:26:42 spproddoc01-0/spproddoc01-0 kernel:
<ffffffff80110f79>{error_exit+0} <ffffffff801106b4>{system_call+124}
Mar  1 06:26:42 spproddoc01-0/spproddoc01-0 kernel:
Mar  1 06:26:42 spproddoc01-0/spproddoc01-0 kernel:
Mar  1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: Code: 0f 0b 7f c0 37 a0
ff ff ff ff 15 03 48 39 7c 24 38 0f 83 d7
Mar  1 06:26:42 spproddoc01-0/spproddoc01-0 kernel: RIP
<ffffffffa0356bc4>{:ocfs2:ocfs2_extend_file+772} RSP <00000100b3f05cd8>
Mar  1 06:26:42 spproddoc01-0/spproddoc01-0 kernel:  <0>Kernel panic: Oops
Ma----- Original Message ----- 
From: "José Costa" <meetra at gmail.com>
To: <ocfs2-users at oss.oracle.com>
Sent: Thursday, March 01, 2007 3:23 PM
Subject: [Ocfs2-users] Problems with ocfs2 when rebooting the first node.


> Hello,
>
> I'm using 2.6.16.41-SLES10_SP1_BRANCH_20070220135926-smp with OCFS2 1.2.4.
>
> If I start the node1 and then the node2... everything works. If I
> reboot the node1, it gives this error to node2 and I can't mount on
> node1 when it comes up and can't do anything on node2 ocfs2 mounts and
> also in /sys/kernel/cluster/*.
>
> I've 8 ocfs2 partitions. (don't ask why)
>
> Here's the kernel bug.
>
> Feb 26 17:39:42 system2 kernel:
> (3903,1):dlm_deref_lockres_handler:2353 ERROR:
> 5400F4D01A9E4561961EFD460CE743B9:M000000000000
> 0000000005e6b4c612: node 0 trying to drop ref but it is already dropped!
> Feb 26 17:39:42 system2 kernel: ------------[ cut here ]------------
> Feb 26 17:39:42 system2 kernel: kernel BUG at fs/ocfs2/dlm/dlmdebug.c:304!
> Feb 26 17:39:42 system2 kernel: invalid opcode: 0000 [#1]
> Feb 26 17:39:42 system2 kernel: SMP
> Feb 26 17:39:42 system2 kernel: last sysfs file:
> /devices/pci0000:00/0000:00:05.0/resource
> Feb 26 17:39:42 system2 kernel: Modules linked in: ocfs2 af_packet
> ocfs2_user_heartbeat ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanag
> er configfs bonding button battery ac apparmor aamatch_pcre loop
> dm_mod i2c_piix4 i2c_core ohci_hcd sworks_agp usbcore agpgar
> t e100 mii e1000 shpchp pci_hotplug ide_cd cdrom parport_pc lp parport
> ext3 jbd edd fan thermal processor i2o_block i2o_core
> qla2xxx firmware_class scsi_transport_fc sg st aic7xxx
> scsi_transport_spi serverworks sd_mod scsi_mod ide_disk ide_core
> Feb 26 17:39:42 system2 kernel: CPU:    1
> Feb 26 17:39:42 system2 kernel: EIP:    0060:[<f9356fe2>]    Tainted:
> G     U VLI
> Feb 26 17:39:42 system2 kernel: EFLAGS: 00010202
> (2.6.16.41-SLES10_SP1_BRANCH_20070220135926-smp #1)
> Feb 26 17:39:42 system2 kernel: EIP is at
> __dlm_print_one_lock_resource+0x12/0x729 [ocfs2_dlm]
> Feb 26 17:39:42 system2 kernel: eax: f70ae401   ebx: 00000000   ecx:
> 00000000   edx: 00000282
> Feb 26 17:39:42 system2 kernel: esi: f70ae460   edi: 0000001f   ebp:
> 00000001   esp: f682de54
> Feb 26 17:39:42 system2 kernel: ds: 007b   es: 007b   ss: 0068
> Feb 26 17:39:42 system2 kernel: Process o2net (pid: 3903,
> threadinfo=f682c000 task=f4a81910)
> Feb 26 17:39:42 system2 kernel: Stack: <0>00000000 00000002 f70ae460
> c0130f57 f682de64 f682de64 00000005 00000082
> Feb 26 17:39:42 system2 kernel:        f682de9c f65a9d64 f65a9d54
> 00000000 f70ae460 0000001f 00000001 c0120a80
> Feb 26 17:39:42 system2 kernel:        f9374221 f682dea8 f682dea8
> f936527b f9374221 00000f3f 00000001 f936e576
> Feb 26 17:39:42 system2 kernel: Call Trace:
> Feb 26 17:39:42 system2 kernel:  [<c0130f57>]
autoremove_wake_function+0x0/0x2d
> Feb 26 17:39:42 system2 kernel:  [<c0120a80>] printk+0x14/0x18
> Feb 26 17:39:42 system2 kernel:  [<f936527b>]
> dlm_deref_lockres_handler+0x2a6/0x3df [ocfs2_dlm]
> Feb 26 17:39:42 system2 kernel:  [<f9365285>]
> dlm_deref_lockres_handler+0x2b0/0x3df [ocfs2_dlm]
> Feb 26 17:39:42 system2 kernel:  [<f92fc792>]
> o2net_process_message+0x3e7/0x598 [ocfs2_nodemanager]
> Feb 26 17:39:42 system2 kernel:  [<f92fba1d>]
> o2net_recv_tcp_msg+0x55/0x60 [ocfs2_nodemanager]
> Feb 26 17:39:42 system2 kernel:  [<f92fe2e6>]
> o2net_rx_until_empty+0x64d/0x773 [ocfs2_nodemanager]
> Feb 26 17:39:42 system2 kernel:  [<c012de26>] run_workqueue+0x78/0xb5
> Feb 26 17:39:42 system2 kernel:  [<f92fdc99>]
> o2net_rx_until_empty+0x0/0x773 [ocfs2_nodemanager]
> Feb 26 17:39:42 system2 kernel:  [<c012e679>] worker_thread+0x0/0x10d
> Feb 26 17:39:42 system2 kernel:  [<c012e755>] worker_thread+0xdc/0x10d
> Feb 26 17:39:42 system2 kernel:  [<c011a53d>]
default_wake_function+0x0/0xc
> Feb 26 17:39:42 system2 kernel:  [<c0130e75>] kthread+0x9d/0xc9
> Feb 26 17:39:42 system2 kernel:  [<c0130dd8>] kthread+0x0/0xc9
> Feb 26 17:39:42 system2 kernel:  [<c0102005>] kernel_thread_helper+0x5/0xb
> Feb 26 17:39:42 system2 kernel: Code: 64 d1 37 f9 0f 85 96 fe ff ff b0
> 01 86 05 60 d1 37 f9 89 e8 5b 5e 5f 5d c3 55 57 56 53
> 83 ec 60 89 44 24 08 8a 40 48 84 c0 7e 08 <0f> 0b 30 01 10 f4 36 f9 f6
> 05 81 de 30 f9 01 75 14 a1 84 de 30
> Feb 26 17:39:43 system2 kernel:  <5>(6543,1):dlm_get_lock_resource:920
> 575FC4A619124A3BA677F994DF3B18F2:$RECOVERY: at least o
> ne node (0) torecover before lock mastery can begin
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>




More information about the Ocfs2-users mailing list