OCFS2/DesignDocs/ReadOnly

OCFS2 readonly support

Mark Fasheh <mark dot fasheh at oracle dot com>

August 31, 2005

Goals / Requirements

Support two levels of readonly-ness. To a user on the system, all of these will disallow modifications to files and their metadata. They differ only in restrictions "under the hood"

Soft readonly: We still join the cluster, participate in locking, node recovery, heartbeat, etc. This is analogous to ext3's ro mount option in which it will actually do recovery of it's journal, orphans, etc. Supporting this will require no changes in the cluster stack (file system only), including the ability to go soft readonly on error.
Hard readonly: No cluster services will be active, including network, dlm, etc. The file system will consider the actual device to be readonly - absolutely no writes will be submitted. As a result, no metadata or data coherency guarantees are possible. If hard readonly is entered on file system error, it will cause an abort of the current journal. As it may also be a network (or cluster) error which forces a hard readonly operation, the dlm would require the ability to take down it's state without further network communication. Hence, if we wish to enter hard readonly mode during an active mount, then that will require dlm and possibly heartbeat/network changes.

Give OCFS2 the ability to go to / from a given readonly mode for a given number of conditions:

As a result of a readonly mount request
As a result of a remount request (this can go both ways: ro->rw and rw->ro)
As a result of a system error. This can be almost anything and come from many contexts. Initially this can just mean "we got an io error" or had an unrecoverable metadata inconsistency. Investigation should be done as to whether we can make the sort of timing guarantees which would allow us to use hard readonly operation as an effective self fencing method.

Restrictions

We only support one mount option 'ro'. The file system will do a soft readonly mount unless it detects a readonly device, in which case it prints a useful message (much like ext3) and goes hard readonly.

User will not be allowed to remount when in hard readonly mode - it is extremely difficult to resolve the state differences between the cluster and a node which has been reading old or invalid data.

Generally, system errors will send the file system into soft readonly. Hard readonly is reserved for bad disk problems (journal failures, etc) or cluster events such as loss of network, heartbeat self fencing, etc.