Fwd: RE: [Ocfs-users] OCFS Hang
Sunil Mushran
Sunil.Mushran at oracle.com
Wed Apr 21 11:01:29 CDT 2004
Your patch has been checked in and will be included in
the next rev.... 1.0.12.
Jeremy Schneider wrote:
>Oh yeah - easy way to check, Randy:
>
>Next time your node hangs, get on the OTHER NODE and go into each
>directory where files are being opened (datafiles, archivelogs,
>controlefiles, redo logs, etc) and delete a file (you can create one
>first then delete it). If this causes the hung node to recover then
>you're having the same problem I was having.
>
>Jeremy
>
>
>
>>>>"Jeremy Schneider" <jer1887 at asugroup.com> 04/21/2004 10:14:04 AM
>>>>
>>>>
>>>>
>Just a thought, but you might be having the same problem I was having.
>
>Symptoms sound *very* similar. The patch has supposedly been merged
>into the source tree but I don't think they've released a new version
>of
>OCFS since the merge. (Sunil or Wim - do you know if this bugfix was
>included in 1.0.11-1?)
>
>Check
>http://oss.oracle.com/pipermail/ocfs-users/2004-March/000192.html
>
>For the geek [technical] description, check
>http://oss.oracle.com/pipermail/ocfs-users/2004-March/000185.html or
>http://www.asugroup.com/ocfsbugfix.txt
>
>Jeremy
>
>
>
>>>>"Doering, Randy" <Randy.Doering at ventersciencejtc.org> 04/19/2004
>>>>
>>>>
>6:23:52 PM >>>
>Kurt, Thanks for the info. We ended up stopping/restarting the DB.
>That
>was successful, although trying to get to /u06/oradata/database was
>still hanging. We then rebooted the node, and after that everything is
>fine now. I'll look more into this using your suggestions and
>hopefully
>if/when it happens again, I'll have more information for you all.
>
>BTW, using ocfstool, I was able to "browse" over and see the contents
>of that directory fine.
>
>Thanks again,
>Randy
>
>PS: We had also logged a case with oracle support.
>
>
> -----Original Message-----
> From: Kurt Hackel [mailto:Kurt.Hackel at oracle.com]
> Sent: Mon 4/19/2004 3:54 PM
> To: Doering, Randy
> Cc: ocfs-users at oss.oracle.com
> Subject: Re: [Ocfs-users] OCFS Hang
>
>
>
> Hi Randy,
>
> It looks like you have some process stuck that had previously
>done a
> down() on a semaphore in the /u06/oradata/database directory.
>Pretty
> much every operation inside that directory from that node will
>hang once
> the first hang occurs.
>
> The best place to go is to Oracle Support at this point. But
>in
>any
> case, the information they will want is a
> "debugocfs -f /oradata/database/ /dev/raw/raw##" and a
> "debugocfs -d /oradata/database/ /dev/raw/raw##" and a
> "fsck.ocfs -v /dev/raw/raw##".
>
> My guess is either that the fsck.ocfs output will show an ERROR
>that
> says you have a system file locked by another node, or that you
>have
> some process actively spinning in the ocfs code. If it turns
>out to be
> the latter, you would also want to get the output of
>/var/log/messages
> after running this:
> "echo -1 > /proc/sys/kernel/ocfs/debug_level"
> "echo -1 > /proc/sys/kernel/ocfs/debug_context"
> making sure to set both of these values back to 0 after a
>couple
> minutes. Also, make sure to get a "ps -ef" or "ps awux" output
>too,
> in order to match up the process ids.
>
> The solution to any of the bugs I have mentioned will likely
>involve
> taking down one node, depending upon which bug you have hit.
>Since in
> your case it unfortunately looks like the trouble partition
>contains
> your datafiles, I would prepare to shutdown the database on
>this
>node in
> anticipation of a reboot. The other RAC node can likely remain
>up and
> running. (If this were a partition containing only archives,
>for
> instance, you could possibly keep the database up by just
>switching
> archive destination temporarily).
>
> Thanks!
> -kurt
>
>
>
> On Mon, Apr 19, 2004 at 03:02:23PM -0400, Doering, Randy wrote:
> >
> >
> > Greetings,
> >
> >
> >
> > Having read about the previous OSFS hangs, I
>think
>this one
> > that we are seeing is different, but I'm not sure if this is
>caused by
> > OCFS or the Linux OS.
> >
> >
> >
> > We are running OCFS Version 1.09 with Linux AS
>3.0/9i RAC.
> >
> >
> >
> > We have a 2 node Intel Cluster (Node 1 and Node 2). This
>morning the DBA
> > tried to do an "ls" command on /u06/oradata/database and his
>process
> > hung. I tried to kill his "ls" process and it is unkillable.
>On Node 2,
> > the "ls" on /u06/oradata/database worked fine. All of the
>other file
> > systems (on both nodes) are fine.
> >
> >
> >
> > Also, what we can't get rid of is this process:
> >
> >
> >
> > oracle 23593 1 95 10:00 ? 04:45:11 oracleXYZ2
> > (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
> >
> >
> >
> > and it's been accumulating CPU time since the
>hang. I'm
> > unsure if this process is a victim or the cause of the hangs.
> >
> >
> >
> > I hope that I have provided enough information
>about the
> > situation. If not, let me know and I'll get more.
> >
> >
> >
> > Regards,
> >
> > Randy
> >
> >
> >
>
> > _______________________________________________
> > Ocfs-users mailing list
> > Ocfs-users at oss.oracle.com
> > http://oss.oracle.com/mailman/listinfo/ocfs-users
>
>
>
>
>This message (including any attachments) contains confidential
>information intended for a specific individual(s) and purpose, and is
>protected by law. If you are not the intended recipient, you should
>delete this message. Any disclosure, copying, or distribution of this
>message, or the taking of any action based on it, by anyone other than
>the intended recipient(s), is strictly prohibited.
>
><<<<...>>>>
>_______________________________________________
>Ocfs-users mailing list
>Ocfs-users at oss.oracle.com
>http://oss.oracle.com/mailman/listinfo/ocfs-users
>_______________________________________________
>Ocfs-users mailing list
>Ocfs-users at oss.oracle.com
>http://oss.oracle.com/mailman/listinfo/ocfs-users
>
>
More information about the Ocfs-users
mailing list