[Ocfs2-devel] dlm_pick_recovery_master algorithm?

Daniel Phillips phillips at google.com
Wed May 31 18:01:36 CDT 2006


Thanks Kurt, great answers!

 > You wrote:
> One note on all of this: this is NOT how we would like to do recovery
> going forward, we just did not have a solid cluster membership service
 > in place that we could use when the mastery/recovery code was written.
 > Once we do have a stable mechanism and API (stop/start/finish) to depend
 > upon, I would like to rewrite the whole thing for lock-table-based mastery
 > and much more sensible recovery.

What is the pedigree of that stop/start/finish API?  Is it the only stable
mechanism you know of to build a more sensible recovery on?

 > As it stands, it's a brittle structure
 > that has to continually try to detect node failures inline and make
 > adjustments as recovery is ongoing, which is no fun.

Not to mention, slow and not obviously terminating, indeed.

Regards,

Daniel



More information about the Ocfs2-devel mailing list