[Ocfs2-devel] [RFC] ocfs2: Double about ocfs2_trylock_journal

jiangyiwen jiangyiwen at huawei.com
Mon Feb 29 19:19:14 PST 2016


ocfs2_trylock_journal() is used to test if the node who occupied this
slot is alive in ocfs2_mark_dead_nodes(), but actually it can't achieve
the desired results. The problem can be described as follows:

N1              N2                        N3
crash, previously
occupied in slot 1
                begin mount, only have
                N2,N3 in domain_map,
                and found slot 1 is occupied,
                then call ocfs2_trylock_journal()
                                          N3 is lockres master of
                                          journal:0001, but N3 doesn't
                                          find N1 down, so return
                                          DLM_NOTQUEUED to N2
                Because N3 doesn't find N1
                down, so ocfs2_trylock_journal()
                return EAGAIN, and will not
                recover N1
                                          in this moment, N3 crash
                N2 only recover N3 in this
                situation, and then begin
                update some meta data which
                also have been operated in
                journal:0001
N1 starts, mount
volume, and recover
journal:0001, this
will cover meta data
which N2 has modified,
and then cause filesystem
is destroyed.

So I want to know if someone has a good idea to solve this problem?

Thanks,
Yiwen Jiang.




More information about the Ocfs2-devel mailing list