[Ocfs2-devel] [PATCH] ocfs2: limit printk when journal is aborted

Joseph Qi joseph.qi at huawei.com
Fri Apr 18 02:18:27 PDT 2014


On 2014/4/18 10:45, Mark Fasheh wrote:
> On Fri, Apr 18, 2014 at 09:02:33AM +0800, Joseph Qi wrote:
>> On 2014/4/18 5:01, Mark Fasheh wrote:
>>> On Thu, Apr 17, 2014 at 07:08:42PM +0800, Joseph Qi wrote:
>>>>
>>>> Once JBD2_ABORT is set, ocfs2_commit_cache will fail in
>>>> ocfs2_commit_thread. Then it will get into a loop with mass logs. This
>>>> will meaninglessly consume a larger number of resource and may lead to
>>>> system hung at last.
>>>> So limit printk in this case.
>>>>
>>>> Signed-off-by: Joseph Qi <joseph.qi at huawei.com>
>>>> ---
>>>>  fs/ocfs2/journal.c | 12 ++++++++++--
>>>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
>>>> index 44fc3e5..cfefbd1 100644
>>>> --- a/fs/ocfs2/journal.c
>>>> +++ b/fs/ocfs2/journal.c
>>>> @@ -30,6 +30,7 @@
>>>>  #include <linux/kthread.h>
>>>>  #include <linux/time.h>
>>>>  #include <linux/random.h>
>>>> +#include <linux/delay.h>
>>>>  
>>>>  #include <cluster/masklog.h>
>>>>  
>>>> @@ -2191,8 +2192,15 @@ static int ocfs2_commit_thread(void *arg)
>>>>  					 || kthread_should_stop());
>>>>  
>>>>  		status = ocfs2_commit_cache(osb);
>>>> -		if (status < 0)
>>>> -			mlog_errno(status);
>>>> +		if (status < 0) {
>>>> +			static unsigned long abort_warn_time;
>>>> +
>>>> +			/* Warn about this once per minute */
>>>> +			if (printk_timed_ratelimit(&abort_warn_time, 60*HZ))
>>>> +				mlog(ML_ERROR, "status = %d, journal is "
>>>> +						"already aborted.\n", status);
>>>> +			msleep_interruptible(1000);
>>>> +		}
>>>
>>> Why the msleep? ocfs2_commit_thread will wait on the checkpoint_event queue
>>> right after this anyway - is there a problem with it waiting on that?
>>>
>> Since jbd2 is already aborted, commit cache is meaningless.
> 
> I understand that, but I'm asking why the msleep and whether we can avoid
> that. To go back to my question:
> 
> "ocfs2_commit_thread will wait on the checkpoint_event queue right after
> this anyway - is there a problem with it waiting on that?"
> 
> Thanks,
> 	--Mark
Sorry for my obscure description.
If ocfs2_commit_cache fails because of JBD2_ABORT, j_num_trans won't be cleared.
Then the condition of checkpoint event still evaluates true, so it won't wait.
> 
> --
> Mark Fasheh
> 
> .
> 





More information about the Ocfs2-devel mailing list