[OracleOSS] [TitleIndex] [WordIndex]

OCFS2/DesignDocs/NewSlotMap

New Slot Map Format

JoelBecker, December 2007

Introduction

Work is underway to use ocfs with userspace cluster stacks. However, all userspace cluster stacks support node numbers greater than 32567, the maximum node number the current slot map can handle. This design document specifies a new slot map format that will support larger node numbers. At the same time, it will remove other limitations of the current design.

Current Limitations

There are three main limitations to the current slot map design.

New Design Boundaries

The New Slot Map Format

#define OCFS2_FEATURE_INCOMPAT_EXTENDED_SLOT_MAP 0x100

struct ocfs2_extended_slot {
        __u8    es_valid;
        __u8    es_reserved1[3];
        __le32  es_node_num;
};

struct ocfs2_slot_map_extended {
        struct ocfs2_extended_slot se_slots[0];
};

The new slot map format is in use if super->s_feature_incompat contains OCFS2_FEATURE_INCOMPAT_EXTENDED_SLOT_MAP. If the feature bit is not set, the original slot map format is in use. The slot map is still contained in the "slot_map" system file.

In the old format, the size of the slot_map file was always exactly one cluster. The new slot_map file's size is the size of the total allocation required to hold super->s_max_slots * sizeof(struct ocfs2_extended_slot). The file is merely an array of these extended slots - all array positions are formatted, even though we only use super->s_max_slots values. Both schemes intentionally over commit slot map size to make adding slots easier. i_size is set to the full allocation.

The Extended Slot Entry

An extended slot entry contains a field for the node number supporting numbers up to UINT32_MAX. This field is treated as unsigned. It also contains an 8-bit field for validity. If the es_valid field is nonzero, the entry is valid and the es_node_num field contains valid data. If the es_valid field is zero, the entire entry can be considered empty.

There are three bytes of reserved space. This allows for extension of the entry, rather than another wholesale rewrite of the slot map.

Filesystem Changes

The filesystem should be able to read and write both the old and the new format. This can be readily accomplished by isolating the code that reads and writes the map. This is accomplished in a few steps.

With these changes, reading the slot map populates an in-memory map that isn't tied to the on-disk format. All access from the rest of the filesystem references this in-memory map. At write time, the in-memory map is converted to the appropriate format.

These changes are available on the new-slot-map branch of Joel's linux-2.6 Git tree.

Tools Changes

The ocfs2 toolset needs to be able to create and read this new format. mkfs.ocfs2(8) needs to create filesystems using the new format. tunefs.ocfs2(8) should switch between formats. And any tool that examines the map needs to read the new format.

These changes are available on the new-slot-map branch of the ocfs2-tools Git tree.


2011-12-23 01:01