[Ocfs2-users] Journal replay after crash, kernel BUG at fs/ocfs2/journal.c:1700!, 2.6.36

Fri Oct 29 04:51:33 PDT 2010

>>>> [157768.261818] (ocfs2rec,14060,0):ocfs2_replay_journal:1605
>>>> Recovering node 0 from slot 0 on device (8,32)
>>>> [157772.850182] ------------[ cut here ]------------
>>>> [157772.850211] kernel BUG at fs/ocfs2/journal.c:1700!
>>>
>>> Strange. the bug line is
>>> BUG_ON(osb->node_num == node_num);
>>> and it detects the same node number in the cluster.
>
> I just tried to reproduce it and succeeded. Here's what I did:
> - unmount the filesystem on node app02
> - shutdown the o2cb services on app02
> - Do a halt -f on app01, which still has the OCFS2 volume mounted.
> - Start o2cb services on app02
> - Mount the OCFS2 filesystem -> BUG
>
> Works everytime. So one of the 2 variables checked in that BUG_ON
> statement must no be set correctly somewhere.

One final bit of information: I just retested on 2.6.35.7 and it works
fine there, so this looks like a regression.

[  819.719661] (ocfs2rec,4135,0):ocfs2_replay_journal:1605 Recovering
node 0 from slot 1 on device (8,32)
[  823.013843] (ocfs2rec,4135,0):ocfs2_begin_quota_recovery:407
Beginning quota recovery in slot 1
[  823.018420] (ocfs2_wq,4117,0):ocfs2_finish_quota_recovery:598
Finishing quota recovery in slot 1

Notice the difference in slot number in the recovery message though
(recovery was running on app02 in both situations).

Regards,
Ronald.