[Ocfs2-devel] [PATCH] ocfs2: call ocfs2_abort when journal abort
Andrew Morton
akpm at linux-foundation.org
Fri Dec 18 15:50:35 PST 2015
On Fri, 18 Dec 2015 15:19:25 +0800 Ryan Ding <ryan.ding at oracle.com> wrote:
> orabug: 22293201
>
> journal can not recover from abort state, so we should take following action to
> prevent file system from corruption:
>
> 1. change to readonly filesystem when local mount. We can not afford further
> write, so change to RO state is reasonable.
>
> 2. panic when cluster mount. Because we can not release lock resource in this
> state, other node will hung when it require a lock owned by this node. So
> panic and remaster is a reasonable choise.
>
> ocfs2_abort() will do all the above work.
>
> ...
>
> --- a/fs/ocfs2/journal.c
> +++ b/fs/ocfs2/journal.c
> @@ -30,7 +30,6 @@
> #include <linux/kthread.h>
> #include <linux/time.h>
> #include <linux/random.h>
> -#include <linux/delay.h>
>
> #include <cluster/masklog.h>
>
> @@ -2265,7 +2264,7 @@ static int __ocfs2_wait_on_mount(struct ocfs2_super *osb, int quota)
>
> static int ocfs2_commit_thread(void *arg)
> {
> - int status;
> + int status = 0;
> struct ocfs2_super *osb = arg;
> struct ocfs2_journal *journal = osb->journal;
>
> @@ -2279,22 +2278,18 @@ static int ocfs2_commit_thread(void *arg)
> wait_event_interruptible(osb->checkpoint_event,
> atomic_read(&journal->j_num_trans)
> || kthread_should_stop());
> + if (status < 0)
> + /* As we can not terminate by myself, just enter an
> + * empty loop to wait for stop. */
> + continue;
This is a busy-wait loop, isn't it? That's going to chew lots of CPU
and in some situations (eg, SMP=n, PREEMPT=n) it will lock up the
kernel because kjournald will never run.
> status = ocfs2_commit_cache(osb);
> - if (status < 0) {
> - static unsigned long abort_warn_time;
> -
> - /* Warn about this once per minute */
> - if (printk_timed_ratelimit(&abort_warn_time, 60*HZ))
> - mlog(ML_ERROR, "status = %d, journal is "
> - "already aborted.\n", status);
> - /*
> - * After ocfs2_commit_cache() fails, j_num_trans has a
> - * non-zero value. Sleep here to avoid a busy-wait
> - * loop.
> - */
> - msleep_interruptible(1000);
> - }
> + if (status < 0)
> + /* journal can not recover from abort state, there is
> + * no need to keep commit cache. So we should either
> + * change to readonly(local mount) or just panic
> + * (cluster mount). */
> + ocfs2_abort(osb->sb, "Detected aborted journal");
Coding-style issues:
It would be more conventional to add braces for the comment:
if (status < 0) {
/* journal can not recover from abort state, there is
* no need to keep commit cache. So we should either
* change to readonly(local mount) or just panic
* (cluster mount). */
ocfs2_abort(osb->sb, "Detected aborted journal");
}
And to lay out the comment like this:
/*
* journal can not recover from abort state, there is
* no need to keep commit cache. So we should either
* change to readonly(local mount) or just panic
* (cluster mount).
*/
More information about the Ocfs2-devel
mailing list