[Ocfs2-devel] About Mark's advice on bug 48
Mark Fasheh
mark.fasheh at oracle.com
Mon Mar 29 12:28:19 CST 2004
On Fri, Mar 26, 2004 at 03:27:07PM +0800, Sonic Zhang wrote:
> Hi Mark,
>
> Finally, I found the second halt is caused by starvation when routine
> ocfs_joutnal_set_unmounted() acquiring the lock osb->publish_lock. In
> thread ocfs_volume_thread(), the delta jiffies to sleep between up() and
> down() in schedule_timeout() is too short. Routine
> ocfs_joutnal_set_unmounted() has no chance to check if lock
> osb->publish_lock is released between it is releases and reacquired by
> thread ocfs_volume_thread. So routine ocfs_journal_set_unmounted()
> always waits in loop. After I change the delta jiffies from 50 to 500,
> kernel 2.6 won't halt when it reboots after a OCFS volume is mounted.
*ouch* it seems that jiffies changed between 2.4 and 2.6 -- the code as is
will be heartbeating *way* to often, in fact prolly even swamping your disk!
Ok, I need to take a closer look at this (I believe we use jiffies in other
places too!), but good catch!
> I also add a line to release the lock in a branch to symbol "finally".
> This may remove latent dead lock. In addition, I clear the reference
> point OcfsIpcCtxt.task before thread ocfs_recv_thread() exits. This
> prevents invalid access to the task structure in routine
> ocfs_dismount_volume() when rebooting.
This is good, though setting OcfsIpcCtxt.task is prolly redundant as it's
set in dismount volume, but I've always wondered why we didn't just set it
there.
--Mark
--
Mark Fasheh
Software Developer, Oracle Corp
mark.fasheh at oracle.com
More information about the Ocfs2-devel
mailing list