[Ocfs-users] OCFS adding/removing nodes
David McWhinnie
davidmcwhinnie at yahoo.com
Thu Jan 6 13:24:08 CST 2005
We haven't touched anything software wise, and don't
see any hardware errors, and the errors only occur
during the dd command.
The messages only occur on the partition I am running
dd on, and only on the node (2 node system) that is
NOT running dd.
--- Sunil Mushran <Sunil.Mushran at oracle.com> wrote:
> err.... any gremlins in your machine. :-)
>
> Check the hardware. What I mean is that if you have
> not touched your software (as in kernel/modules) and
> you
> are getting the msgs even with no load on the
> system,
> I would concentrate on the disks/controller.
>
> Are the messages highlighting one volume only or
> all?
> As in, is only one volume getting in and out of the
> cluster?
> Also, are the msgs only on one box or on all?
>
> David McWhinnie wrote:
>
> >Just started happening, there is no other activity
> on
> >the system. Everything is shutdown.
> >
> >I was originally looking into a "no space left on
> >device issue" (lots of space left tho) while
> running
> >an RMAN restore.... Thought that was a
> fragmentation
> >issue, but reformated the partitions and the no
> space
> >error came back. So at that point was trying a dd
> to
> >create a file and noticed the dd hanging along with
> >the messages about nodes being removed.
> >
> >But when the no space left error occurs, the node
> >removal messages don't appear.
> >
> >David
> >
> >--- Sunil Mushran <Sunil.Mushran at oracle.com> wrote:
> >
> >
> >
> >>No, dd is mostly unrelated.
> >>What those messages indicate that the heartbeat
> >>thread is not keeping up,
> >>which is not good. Yes, all the processes
> requiring
> >>a cluster lock will hang
> >>as a node is being evicted from the cluster.
> >>
> >>That you are getting these errors with as low as
> 16
> >>mounts is puzzling.
> >>
> >>When do these errors messages pop up? Any
> >>relationship with the load on
> >>the system?
> >>
> >>David McWhinnie wrote:
> >>
> >>
> >>
> >>>We have 16 OCFS mounts.
> >>>We are running RedHat 2.1
> >>>OCFS version 1. Latest patch.
> >>>
> >>>Turns out a dd was being done without the
> o_direct
> >>>option. So that could be the cause. Interesting
> >>>
> >>>
> >>this
> >>
> >>
> >>>ls, df etc would all hang while the node was
> >>>
> >>>
> >>removed
> >>>from the cluster.
> >>
> >>
> >>>David
> >>>--- Sunil Mushran <Sunil.Mushran at oracle.com>
> wrote:
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>>how many ocfs mounts do you have on that box?
> >>>>which kernel?
> >>>>ocfs version?
> >>>>
> >>>>David McWhinnie wrote:
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>We are getting the following messages.
> >>>>>Jan 6 10:19:25 houided006 kernel: ocfs: Adding
> >>>>>houided005 (node 0) to clustered device (8,208)
> >>>>>Jan 6 10:19:52 houided006 kernel: ocfs:
> Removing
> >>>>>houided005 (node 0) from clustered device
> (8,208)
> >>>>>
> >>>>>every few minutes.
> >>>>>
> >>>>>Any advice on how to troubleshoot this?
> >>>>>
> >>>>>David.
> >>>>>
> >>>>>
> >
> >
> >
> >
> >__________________________________
> >Do you Yahoo!?
> >Yahoo! Mail - 250MB free storage. Do more. Manage
> less.
> >http://info.mail.yahoo.com/mail_250
> >
> >
>
__________________________________
Do you Yahoo!?
Meet the all-new My Yahoo! - Try it today!
http://my.yahoo.com
More information about the Ocfs-users
mailing list