[Ocfs2-users] OCFS2 cluster won't come up and stay up

Tony Rios tony at tonyrios.com
Fri Dec 2 13:54:01 PST 2011


Sunil, in an essence of getting everything back online, I powered down every single node.

I powered up 1 of the nodes that seemed to be able to mount the filesystem.
Ran an fsck on the filesystem before allowing it to be mounted.
It complained that some of the nodes unmounted cleanly, but set the clean flag after a couple seconds.
I re-ran fsck once more and it came up clean with no warnings or errors.
I then mounted this server and it didn't complain at all.
I am now in the process of bringing online one server at a time, so far the first 4 have no complained at all.
So we are back up and running, but hopefully the logs could still provide some useful information as well.

Tony

On Dec 1, 2011, at 6:36 PM, Sunil Mushran wrote:

> To analyze one needs the logs. And a bugzilla is a good place holder for the logs. 
> 
> On Dec 1, 2011, at 6:05 PM, Tony Rios <tony at tonyrios.com> wrote:
> 
>> Sunil,
>> Is submitting a bug report the only answer?
>> I'm happy to send in this information, but can I take the cluster down entirely and sort of reset it so we can get these servers back online and talking again in the meanwhile?
>> Tony
>> 
>> On Dec 1, 2011, at 5:05 PM, Sunil Mushran wrote:
>> 
>>> Node 3 is joining the domain. It is having problms getting the superblock cluster lock.
>>> Create a bugzilla on oss.oracle.com and attach the /var/logs/messages from all nodes.
>>> If you have netconsole setup, attach those logs too.
>>> 
>>> On 12/01/2011 04:55 PM, Tony Rios wrote:
>>>> I'm having an issue today where I just can't seem to keep all the servers in the cluster online.
>>>> They aren't losing network connectivity and I can ping the iSCSI host just fine and the host is logged in.
>>>> 
>>>> These are the errors form the dmesg when I try to mount the filesystem:
>>>> 
>>>> root at pedge36:~# dmesg
>>>> [    0.000000] Initializing cgroup subsys cpuset
>>>> [    0.000000] Initializing cgroup subsys cpu
>>>> [    0.000000] Linux version 2.6.38-10-generic (buildd at yellow) (gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu4) ) #46-Ubuntu SMP Tue Jun 28 15:07:17 UTC 2011 (Ubuntu 2.6.38-10.46-generic 2.6.38.7)
>>>> [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-2.6.38-10-generic root=UUID=3cd859b8-2605-4a38-8767-a6d1f99d53bd ro debug ignore_loglevel
>>>> [    0.000000] BIOS-provided physical RAM map:
>>>> [    0.000000]  BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
>>>> [    0.000000]  BIOS-e820: 0000000000100000 - 00000000effc0000 (usable)
>>>> [    0.000000]  BIOS-e820: 00000000effc0000 - 00000000effcfc00 (ACPI data)
>>>> [    0.000000]  BIOS-e820: 00000000effcfc00 - 00000000effff000 (reserved)
>>>> [    0.000000]  BIOS-e820: 00000000f0000000 - 00000000f4000000 (reserved)
>>>> [    0.000000]  BIOS-e820: 00000000fec00000 - 00000000fed00400 (reserved)
>>>> [    0.000000]  BIOS-e820: 00000000fed13000 - 00000000feda0000 (reserved)
>>>> [    0.000000]  BIOS-e820: 00000000fee00000 - 00000000fee10000 (reserved)
>>>> [    0.000000]  BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved)
>>>> [    0.000000]  BIOS-e820: 0000000100000000 - 00000001ffffe000 (usable)
>>>> [    0.000000]  BIOS-e820: 00000001ffffe000 - 0000000200000000 (reserved)
>>>> [    0.000000]  BIOS-e820: 0000000200000000 - 0000000210000000 (usable)
>>>> [    0.000000] debug: ignoring loglevel setting.
>>>> [    0.000000] NX (Execute Disable) protection: active
>>>> [    0.000000] DMI 2.3 present.
>>>> [    0.000000] DMI: Dell Computer Corporation PowerEdge 850/0Y8628, BIOS A04 08/22/2006
>>>> [    0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==>  (reserved)
>>>> [    0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
>>>> [    0.000000] No AGP bridge found
>>>> [    0.000000] last_pfn = 0x210000 max_arch_pfn = 0x400000000
>>>> [    0.000000] MTRR default type: uncachable
>>>> [    0.000000] MTRR fixed ranges enabled:
>>>> [    0.000000]   00000-9FFFF write-back
>>>> [    0.000000]   A0000-BFFFF uncachable
>>>> [    0.000000]   C0000-CBFFF write-protect
>>>> [    0.000000]   CC000-EBFFF uncachable
>>>> [    0.000000]   EC000-FFFFF write-protect
>>>> [    0.000000] MTRR variable ranges enabled:
>>>> [    0.000000]   0 base 000000000 mask E00000000 write-back
>>>> [    0.000000]   1 base 200000000 mask FF0000000 write-back
>>>> [    0.000000]   2 base 0F0000000 mask FF0000000 uncachable
>>>> [    0.000000]   3 disabled
>>>> [    0.000000]   4 disabled
>>>> [    0.000000]   5 disabled
>>>> [    0.000000]   6 disabled
>>>> [    0.000000]   7 disabled
>>>> [    0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
>>>> [    0.000000] e820 update range: 00000000f0000000 - 0000000100000000 (usable) ==>  (reserved)
>>>> [    0.000000] last_pfn = 0xeffc0 max_arch_pfn = 0x400000000
>>>> [    0.000000] found SMP MP-table at [ffff8800000fe710] fe710
>>>> [    0.000000] initial memory mapped : 0 - 20000000
>>>> [    0.000000] init_memory_mapping: 0000000000000000-00000000effc0000
>>>> [    0.000000]  0000000000 - 00efe00000 page 2M
>>>> [    0.000000]  00efe00000 - 00effc0000 page 4k
>>>> [    0.000000] kernel direct mapping tables up to effc0000 @ 1fffa000-20000000
>>>> [    0.000000] init_memory_mapping: 0000000100000000-0000000210000000
>>>> [    0.000000]  0100000000 - 0210000000 page 2M
>>>> [    0.000000] kernel direct mapping tables up to 210000000 @ effb6000-effc0000
>>>> [    0.000000] RAMDISK: 366d0000 - 37360000
>>>> [    0.000000] ACPI: RSDP 00000000000fd160 00014 (v00 DELL  )
>>>> [    0.000000] ACPI: RSDT 00000000000fd174 00038 (v01 DELL   PE850    00000001 MSFT 0100000A)
>>>> [    0.000000] ACPI: FACP 00000000000fd1b8 00074 (v01 DELL   PE850    00000001 MSFT 0100000A)
>>>> [    0.000000] ACPI: DSDT 00000000effc0000 01C19 (v01 DELL   PE830    00000001 MSFT 0100000E)
>>>> [    0.000000] ACPI: FACS 00000000effcfc00 00040
>>>> [    0.000000] ACPI: APIC 00000000000fd22c 00074 (v01 DELL   PE850    00000001 MSFT 0100000A)
>>>> [    0.000000] ACPI: SPCR 00000000000fd2a0 00050 (v01 DELL   PE850    00000001 MSFT 0100000A)
>>>> [    0.000000] ACPI: HPET 00000000000fd2f0 00038 (v01 DELL   PE830    00000001 MSFT 0100000A)
>>>> [    0.000000] ACPI: MCFG 00000000000fd328 0003C (v01 DELL   PE830    00000001 MSFT 0100000A)
>>>> [    0.000000] ACPI: Local APIC address 0xfee00000
>>>> [    0.000000] No NUMA configuration found
>>>> [    0.000000] Faking a node at 0000000000000000-0000000210000000
>>>> [    0.000000] Initmem setup node 0 0000000000000000-0000000210000000
>>>> [    0.000000]   NODE_DATA [00000001ffff9000 - 00000001ffffdfff]
>>>> [    0.000000]  [ffffea0000000000-ffffea00073fffff] PMD ->  [ffff8801f7e00000-ffff8801feffffff] on node 0
>>>> [    0.000000] Zone PFN ranges:
>>>> [    0.000000]   DMA      0x00000010 ->  0x00001000
>>>> [    0.000000]   DMA32    0x00001000 ->  0x00100000
>>>> [    0.000000]   Normal   0x00100000 ->  0x00210000
>>>> [    0.000000] Movable zone start PFN for each node
>>>> [    0.000000] early_node_map[4] active PFN ranges
>>>> [    0.000000]     0: 0x00000010 ->  0x000000a0
>>>> [    0.000000]     0: 0x00000100 ->  0x000effc0
>>>> [    0.000000]     0: 0x00100000 ->  0x001ffffe
>>>> [    0.000000]     0: 0x00200000 ->  0x00210000
>>>> [    0.000000] On node 0 totalpages: 2096974
>>>> [    0.000000]   DMA zone: 56 pages used for memmap
>>>> [    0.000000]   DMA zone: 7 pages reserved
>>>> [    0.000000]   DMA zone: 3921 pages, LIFO batch:0
>>>> [    0.000000]   DMA32 zone: 14280 pages used for memmap
>>>> [    0.000000]   DMA32 zone: 964600 pages, LIFO batch:31
>>>> [    0.000000]   Normal zone: 15232 pages used for memmap
>>>> [    0.000000]   Normal zone: 1098878 pages, LIFO batch:31
>>>> [    0.000000] ACPI: PM-Timer IO Port: 0x808
>>>> [    0.000000] ACPI: Local APIC address 0xfee00000
>>>> [    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
>>>> [    0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
>>>> [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
>>>> [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
>>>> [    0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
>>>> [    0.000000] IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23
>>>> [    0.000000] ACPI: IOAPIC (id[0x03] address[0xfec10000] gsi_base[32])
>>>> [    0.000000] IOAPIC[1]: apic_id 3, version 32, address 0xfec10000, GSI 32-55
>>>> [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
>>>> [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
>>>> [    0.000000] ACPI: IRQ0 used by override.
>>>> [    0.000000] ACPI: IRQ2 used by override.
>>>> [    0.000000] ACPI: IRQ9 used by override.
>>>> [    0.000000] Using ACPI (MADT) for SMP configuration information
>>>> [    0.000000] ACPI: HPET id: 0xffffffff base: 0xfed00000
>>>> [    0.000000] SMP: Allowing 2 CPUs, 0 hotplug CPUs
>>>> [    0.000000] nr_irqs_gsi: 72
>>>> [    0.000000] PM: Registered nosave memory: 00000000000a0000 - 0000000000100000
>>>> [    0.000000] PM: Registered nosave memory: 00000000effc0000 - 00000000effcf000
>>>> [    0.000000] PM: Registered nosave memory: 00000000effcf000 - 00000000effd0000
>>>> [    0.000000] PM: Registered nosave memory: 00000000effd0000 - 00000000effff000
>>>> [    0.000000] PM: Registered nosave memory: 00000000effff000 - 00000000f0000000
>>>> [    0.000000] PM: Registered nosave memory: 00000000f0000000 - 00000000f4000000
>>>> [    0.000000] PM: Registered nosave memory: 00000000f4000000 - 00000000fec00000
>>>> [    0.000000] PM: Registered nosave memory: 00000000fec00000 - 00000000fed00000
>>>> [    0.000000] PM: Registered nosave memory: 00000000fed00000 - 00000000fed13000
>>>> [    0.000000] PM: Registered nosave memory: 00000000fed13000 - 00000000feda0000
>>>> [    0.000000] PM: Registered nosave memory: 00000000feda0000 - 00000000fee00000
>>>> [    0.000000] PM: Registered nosave memory: 00000000fee00000 - 00000000fee10000
>>>> [    0.000000] PM: Registered nosave memory: 00000000fee10000 - 00000000ffb00000
>>>> [    0.000000] PM: Registered nosave memory: 00000000ffb00000 - 0000000100000000
>>>> [    0.000000] PM: Registered nosave memory: 00000001ffffe000 - 0000000200000000
>>>> [    0.000000] Allocating PCI resources starting at f4000000 (gap: f4000000:ac00000)
>>>> [    0.000000] Booting paravirtualized kernel on bare hardware
>>>> [    0.000000] setup_percpu: NR_CPUS:256 nr_cpumask_bits:256 nr_cpu_ids:2 nr_node_ids:1
>>>> [    0.000000] PERCPU: Embedded 28 pages/cpu @ffff8800efc00000 s84416 r8192 d22080 u1048576
>>>> [    0.000000] pcpu-alloc: s84416 r8192 d22080 u1048576 alloc=1*2097152
>>>> [    0.000000] pcpu-alloc: [0] 0 1
>>>> [    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 2067399
>>>> [    0.000000] Policy zone: Normal
>>>> [    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-2.6.38-10-generic root=UUID=3cd859b8-2605-4a38-8767-a6d1f99d53bd ro debug ignore_loglevel
>>>> [    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
>>>> [    0.000000] Checking aperture...
>>>> [    0.000000] No AGP bridge found
>>>> [    0.000000] Calgary: detecting Calgary via BIOS EBDA area
>>>> [    0.000000] Calgary: Unable to locate Rio Grande table in EBDA - bailing!
>>>> [    0.000000] Memory: 8178472k/8650752k available (5941k kernel code, 262856k absent, 209424k reserved, 5016k data, 956k init)
>>>> [    0.000000] SLUB: Genslabs=15, HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
>>>> [    0.000000] Hierarchical RCU implementation.
>>>> [    0.000000]    RCU dyntick-idle grace-period acceleration is enabled.
>>>> [    0.000000]    RCU-based detection of stalled CPUs is disabled.
>>>> [    0.000000] NR_IRQS:16640 nr_irqs:512 16
>>>> [    0.000000] Console: colour dummy device 80x25
>>>> [    0.000000] console [tty0] enabled
>>>> [    0.000000] allocated 83886080 bytes of page_cgroup
>>>> [    0.000000] please try 'cgroup_disable=memory' option if you don't want memory cgroups
>>>> [    0.000000] hpet clockevent registered
>>>> [    0.000000] Fast TSC calibration using PIT
>>>> [    0.000000] Detected 3000.094 MHz processor.
>>>> [    0.010004] Calibrating delay loop (skipped), value calculated using timer frequency.. 6000.18 BogoMIPS (lpj=30000940)
>>>> [    0.010017] pid_max: default: 32768 minimum: 301
>>>> [    0.010056] Security Framework initialized
>>>> [    0.010082] AppArmor: AppArmor initialized
>>>> [    0.010088] Yama: becoming mindful.
>>>> [    0.012092] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
>>>> [    0.022482] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
>>>> [    0.024244] Mount-cache hash table entries: 256
>>>> [    0.024453] Initializing cgroup subsys ns
>>>> [    0.024463] ns_cgroup deprecated: consider using the 'clone_children' flag without the ns_cgroup.
>>>> [    0.024472] Initializing cgroup subsys cpuacct
>>>> [    0.024481] Initializing cgroup subsys memory
>>>> [    0.024495] Initializing cgroup subsys devices
>>>> [    0.024501] Initializing cgroup subsys freezer
>>>> [    0.024507] Initializing cgroup subsys net_cls
>>>> [    0.024512] Initializing cgroup subsys blkio
>>>> [    0.024574] CPU: Physical Processor ID: 0
>>>> [    0.024580] CPU: Processor Core ID: 0
>>>> [    0.024586] mce: CPU supports 4 MCE banks
>>>> [    0.024603] CPU0: Thermal monitoring enabled (TM1)
>>>> [    0.024612] using mwait in idle threads.
>>>> [    0.027748] ACPI: Core revision 20110112
>>>> [    0.029308] ftrace: allocating 24323 entries in 96 pages
>>>> [    0.030085] Setting APIC routing to flat
>>>> [    0.030516] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
>>>> [    0.136419] CPU0: Intel(R) Pentium(R) D CPU 3.00GHz stepping 04
>>>> [    0.140000] Performance Events: Netburst events, Netburst P4/Xeon PMU driver.
>>>> [    0.140000] ... version:                0
>>>> [    0.140000] ... bit width:              40
>>>> [    0.140000] ... generic registers:      18
>>>> [    0.140000] ... value mask:             000000ffffffffff
>>>> [    0.140000] ... max period:             0000007fffffffff
>>>> [    0.140000] ... fixed-purpose events:   0
>>>> [    0.140000] ... event mask:             000000000003ffff
>>>> [    0.140000] Booting Node   0, Processors  #1 Ok.
>>>> [    0.300021] Brought up 2 CPUs
>>>> [    0.300030] Total of 2 processors activated (12000.49 BogoMIPS).
>>>> [    0.300847] devtmpfs: initialized
>>>> [    0.302451] print_constraints: dummy:
>>>> [    0.302485] Time:  0:41:31  Date: 12/02/11
>>>> [    0.302546] NET: Registered protocol family 16
>>>> [    0.302672] Trying to unpack rootfs image as initramfs...
>>>> [    0.310474] ACPI: bus type pci registered
>>>> [    0.310570] PCI: MMCONFIG for domain 0000 [bus 00-3f] at [mem 0xf0000000-0xf3ffffff] (base 0xf0000000)
>>>> [    0.310580] PCI: MMCONFIG at [mem 0xf0000000-0xf3ffffff] reserved in E820
>>>> [    0.340577] PCI: Using configuration type 1 for base access
>>>> [    0.342112] bio: create slab<bio-0>  at 0
>>>> [    0.342934] ACPI: EC: Look up EC in DSDT
>>>> [    0.345243] ACPI: Interpreter enabled
>>>> [    0.345252] ACPI: (supports S0 S4 S5)
>>>> [    0.345278] ACPI: Using IOAPIC for interrupt routing
>>>> [    0.349231] ACPI: No dock devices found.
>>>> [    0.349239] HEST: Table not found.
>>>> [    0.349246] PCI: Ignoring host bridge windows from ACPI; if necessary, use "pci=use_crs" and report a bug
>>>> [    0.349794] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
>>>> [    0.350838] pci_root PNP0A03:00: host bridge window [io  0x0000-0x0cf7] (ignored)
>>>> [    0.350848] pci_root PNP0A03:00: host bridge window [io  0x0d00-0xffff] (ignored)
>>>> [    0.350856] pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff] (ignored)
>>>> [    0.350864] pci_root PNP0A03:00: host bridge window [mem 0xf0000000-0xfebfffff] (ignored)
>>>> [    0.350884] pci 0000:00:00.0: [8086:2778] type 0 class 0x000600
>>>> [    0.350946] pci 0000:00:01.0: [8086:2779] type 1 class 0x000604
>>>> [    0.350996] pci 0000:00:01.0: PME# supported from D0 D3hot D3cold
>>>> [    0.351005] pci 0000:00:01.0: PME# disabled
>>>> [    0.351066] pci 0000:00:1c.0: [8086:27d0] type 1 class 0x000604
>>>> [    0.351137] pci 0000:00:1c.0: PME# supported from D0 D3hot D3cold
>>>> [    0.351145] pci 0000:00:1c.0: PME# disabled
>>>> [    0.351178] pci 0000:00:1c.4: [8086:27e0] type 1 class 0x000604
>>>> [    0.351248] pci 0000:00:1c.4: PME# supported from D0 D3hot D3cold
>>>> [    0.351256] pci 0000:00:1c.4: PME# disabled
>>>> [    0.351285] pci 0000:00:1c.5: [8086:27e2] type 1 class 0x000604
>>>> [    0.351355] pci 0000:00:1c.5: PME# supported from D0 D3hot D3cold
>>>> [    0.351363] pci 0000:00:1c.5: PME# disabled
>>>> [    0.351391] pci 0000:00:1d.0: [8086:27c8] type 0 class 0x000c03
>>>> [    0.351443] pci 0000:00:1d.0: reg 20: [io  0xbce0-0xbcff]
>>>> [    0.351484] pci 0000:00:1d.1: [8086:27c9] type 0 class 0x000c03
>>>> [    0.351537] pci 0000:00:1d.1: reg 20: [io  0xbcc0-0xbcdf]
>>>> [    0.351577] pci 0000:00:1d.2: [8086:27ca] type 0 class 0x000c03
>>>> [    0.351629] pci 0000:00:1d.2: reg 20: [io  0xbca0-0xbcbf]
>>>> [    0.351680] pci 0000:00:1d.7: [




More information about the Ocfs2-users mailing list