[OracleOSS] [TitleIndex] [WordIndex]

OCFS2/DesignDocs/ExtendedAttributes

Extended Attributes Preliminary Design Document

Jeff Mahoney, SUSE Labs, Novell

Mark Fasheh, Oracle

Original Revision: July 26, 2006 (jeffm)

Many Updates: November/December 2007 (mfasheh)

Introduction

Extended attributes are used for storing POSIX ACLs, SELinux labels, and user accessible metadata. They are essential for deploying file systems exported for workgroup use via samba. The following document outlines a design for implementing extended attributes on the OCFS2 file system.

The design should be flexible enough to support many large extended attributes, but also quick enough to provide good performance when only a few small extended attributes are associated with the inode. It should also consider that extended attributes are generally accessed less frequently than the data they protect/describe, and in-inode data should take performance precedence over the extended attributes.

In order to meet these goals, the design describes layers of indirection to meet the demands of larger attributes while preserving the performance behavior of smaller ones. Large numbers of attributes should also be handled gracefully.

When space in the inode allows, the xattr header will be kept at the end of the inode block, with xattr entries preceding it on disk. When the inode is being used for in-inode data, or otherwise does not have enough space to contain the xattr header, the header is placed its own block with as many entries as will fit before allocating additional blocks to store entries.

In order to maximize performance, xattr values will be kept with their description entries whenever possible. This applies to both the inode block and external xattr blocks.

Locking

The conventions on local file systems are such that write operations take the inode mutex as well as a per-inode xattr rwsem. Read operations only take the xattr rwsem.

For the initial implementation, the cluster inode meta lock will be used to protect the attribute space. Eventually, it may be desirable to implement a cluster xattr lock that can handle locking/caching/refreshing of high profile metadata like ACLs.

Data Structures

Existing Data Structures

New Flags: OCFS2_HAS_XATTR_FL = 0x0002, OCFS2_INLINE_XATTR_FL = 0x0004

OCFS2_HAS_XATTR_FL is set when the inode has extended attributes. The i_xattr_loc member contains the location of the extended attribute header. If i_dyn_flags contains the OCFS2_INLINE_XATTR_FL flag, the i_xattr_loc member contains the offset from the beginning of the inode block where the ocfs2_xattr_header record is located. If the flag is unset, then the value contains a block number where the first ocfs2_xattr_block can be found.

The structure itself is unmodified, but we use e_cpos to store the hash of the name of the first attribute entry in the block. This modification is used for hashed directories as well and may be able to share the code used for manipulating them. This requires that Sparse Tree Updates be integrated before Extended Attributes can be used.

New Data Structures

Every extended attribute has an ocfs2_xattr_entry associated with it. It indicates what type of extended attribute it describes as well as where to find it, how large it is, etc.

The names and values will always be block local and will be placed in reverse order from the end of the block with the value immediately following the name, subject to alignment rules.

When the xe_local bit is set, the attribute is stored in the local block. When the xe_local bit is unset, the value stored in the local block will be an ocfs2_xattr_value_root record, rooting an extent tree where the attribute data is actually stored. xe_value_size contains the size of the attribute. The full 64 bits for size isn't likely to be used any time soon, but it doesn't cost us much for future-proofing. Most sizes in ocfs2 are 64 bits anyway.

Most attributes will likely be placed in the block with the entry. When they grow too large, the value will be replaced with an ocfs2_xattr_value_root record and the xe_local bit will be cleared. In this case, the ocfs2_extent_list within the ocfs2_xattr_value_root will have a depth of 0, and all block pointers will be local (clusters will be allocated for extent data). A default number of extent list records has yet to be determined. In the event that the attribute is of such a size that it won't fit in the default number of extents, the ocfs2_extent_list will root a standard ocfs2 btree.

In order to avoid wasting storage on names, the name prefix will be mapped to a 7 bit value and removed from the name itself. The name's suffix will be stored in the block.

enum ocfs2_xattr_type 
{
        OCFS2_XATTR_INDEX_USER = 0,
        OCFS2_XATTR_INDEX_POSIX_ACL_ACCESS,
        OCFS2_XATTR_INDEX_POSIX_ACL_DEFAULT,
        OCFS2_XATTR_INDEX_TRUSTED,
        OCFS2_XATTR_INDEX_LUSTRE,
        OCFS2_XATTR_INDEX_SECURITY,
        OCFS2_XATTR_MAX
};

Each entry has a 32-bit hash value associated with it. The hash value is calculated using the full (prefix.suffix) name of the xattr to avoid hash collisions when the same suffix is used in multiple attribute namespaces. It it used to identify when a name is not going to match before doing a string comparison to verify that the name is a match. The entries themselves are stored on disk sorted by xe_name_hash.

Although lookups within a block are a linear operation, the xattr blocks are stored in a b-tree of depth 1. The search space is automatically limited to blocks where there is a likely match by using the e_cpos value in struct ocfs2_extent_rec.

Entries may optionally contain a 32 bit hash value to perform data integrity checks against. When the hash value is 0, it is considered unused.

Names and values will be padded to align on 64-bit boundaries.

The ocfs2_xattr_header describes how many ocfs2_xattr_entry records are in the block.

The xh_count member contains the count of how many records are in the local block. The entries themselves start immediately after the ocfs2_extent_list, which is variable size.

#define OCFS2_XATTR_INDEXED 0x1

The ocfs2_xattr_block is where extended attribute entries are located when they are outside of the local inode block. It has the signature "XATTR01".

xb_flags determines how attributes are to be found. By default, the xb_header field in the xb_attrs union is used to find in-block extended attributes. Once the number of extended attributes gets larger than will fit, then we set OCFS2_XATTR_INDEXED move them into a btree.

If OCFS2_XATTR_INDEXED is set then the xb_root field in the xb_attrs union roots a name-indexed btree. The extent records will be sorted as usual by e_cpos, but will contain the hash value of the first entry in the block.

TODO: this is identical in size and layout to ocfs2_xattr_value_root. We should probably combine them somehow.

The Indexed BTrees for extended attributes designed doc discusses changes related to EA name indexing extensively.

VISUAL LAYOUT (WARNING: THIS IS OUT OF DATE)

A crude visual overview of how the blocks are laid out:

When the xattr header is local:

+-------------------------------------------+
|          OCFS2 INODE BLOCK                |
+-------------------------------------------+
| . . .                                     |
| OCFS2 CORE INODE                          |
| . . .                                     |------+
| __le32 i_dyn_features OCFS2_INLINE_XATTR_FL --+  |
| . . .                                         |  |
| struct ocfs2_xattr_header  <------------------+  |
|  struct ocfs2_extent_list xh_extents      | --------------+
|  struct ocfs2_xattr_entry entry0  -----+  |               |
|  struct ocfs2_xattr_entry entry1  ---+ |  |               |
|  struct ocfs2_xattr_entry entry2     | |  |               |
|  struct ocfs2_xattr_entry entry3     | |  |               |
|  struct ocfs2_xattr_entry entry4     | |  |               |
| entry4 name                          | |  |               |
| entry4 value/extent list             | |  |               |
| entry3 name                          | |  |               |
| entry3 value/extent list             | |  |               |
| entry2 name                          | |  |               |
| entry2 value/extent list             | |  |               |
| entry1 name      <-------------------+ |  |               |
| entry1 value/extent list               |  |               |
| entry0 name      <---------------------+  |               |
| entry0 value/extent list                  | --------------+-----------+
+-------------------------------------------+               |           |
                                                            |           |

When the xattr header is it its own block:

                                                            |           | 
+-------------------------------------------+               |           |
|      OCFS2 EXTENDED ATTRIBUTE BLOCK       |               |           |
+-------------------------------------------+               |           |
| struct ocfs2_xattr_block header "XATTR01" |               |           |
|  struct ocfs2_extent_list xh_extents      | -----+        |           |
|  struct ocfs2_xattr_entry entry0  -----+  |      |        |           |
|  struct ocfs2_xattr_entry entry1  ---+ |  |      |        |           |
|  struct ocfs2_xattr_entry entry2     | |  |      |        |           |
|  struct ocfs2_xattr_entry entry3     | |  |      |        |           |
|  struct ocfs2_xattr_entry entry4     | |  |      |        |           |
| . . .                                | |  |      |        |           |
| entry4 name                          | |  |      |        |           |
| entry4 value/ocfs2_extent_list       | |  | -----+--------+---------+ |
| entry3 name                          | |  |      |        |         | |
| entry3 value/ocfs2_extent_list       | |  |      |        |         | |
| entry2 name                          | |  |      |        |         | |
| entry2 value/ocfs2_extent_list       | |  |      |        |         | |
| entry1 name      <-------------------+ |  |      |        |         | |
| entry1 value/ocfs2_extent_list         |  |      |        |         | |
| entry0 name      <---------------------+  |      |        |         | |
| entry0 value/ocfs2_extent_list            |      |        |         | |
+-------------------------------------------+      |        |         | |
                                                   |        |         | |
+-------------------------------------------+      |        |         | |
|      OCFS2 EXTENDED ATTRIBUTE BLOCK       | <----+--(or)--+         | |
+-------------------------------------------+                         | |
| struct ocfs2_xattr_block header "XATTR02" |                         | |
|  struct ocfs2_extent_list xh_extents      |                         | |
|  struct ocfs2_xattr_entry entry0  -----+  |                         | |
|  struct ocfs2_xattr_entry entry1  ---+ |  |                         | |
|  struct ocfs2_xattr_entry entry2     | |  |                         | |
|  struct ocfs2_xattr_entry entry3     | |  |                         | |
|  struct ocfs2_xattr_entry entry4     | |  |                         | |
| . . .                                | |  |                         | |
| entry4 name                          | |  |                         | |
| entry4 value/ocfs2_extent_list       | |  |                         | |
| entry3 name                          | |  |                         | |
| entry3 value/ocfs2_extent_list       | |  |                         | |
| entry2 name                          | |  |                         | |
| entry2 value/ocfs2_extent_list       | |  | --------+               | |
| entry1 name      <-------------------+ |  |         |               | |
| entry1 value/ocfs2_extent_list         |  | --+-+   |               | |
| entry0 name      <---------------------+  |   | |   |               | |
| entry0 value/ocfs2_extent_list            |   | |   |               | |
+-------------------------------------------+   | |   |               | |
                                                | |   |               | |
+-------------------------------------------+   | |   |               | |
| OCFS2 EXTENDED ATTRIBUTE VALUE BLOCK      | <-+ +   |               | |
+-------------------------------------------+     |   |               | |
|                                           |     |   |               | |
|         (contents)                        |     |   |               | |
|                                           |     |   |               | |
+-------------------------------------------+     |   |               | |
                                                  |   |               | |
+-------------------------------------------+     |   |               | |
| OCFS2 EXTENDED ATTRIBUTE VALUE BLOCK      | <---+   |               | |
+-------------------------------------------+         |               | |
|                                           |         |               | |
|         (contents)                        |         |               | |
|                                           |         |               | |
+-------------------------------------------+         |               | |
                                                      |               | |
+-------------------------------------------+         |               | |
| OCFS2 EXTENT BLOCK                        | <-------+               | |
+-------------------------------------------+                         | |
| (points to attribute blocks or another    |                         | |
|  extent block)                            |                         | |
+-------------------------------------------+                         | |
                                                                      | |
+-------------------------------------------+                         | |
| OCFS2 EXTENDED ATTRIBUTE VALUE BLOCK      | <-----------------------+ |
+-------------------------------------------|                           |
|                                           |                           |
|         (contents)                        |                           |
|                                           |                           |
+-------------------------------------------+                           |
                                                                        |
+-------------------------------------------+                           |
| OCFS2 EXTENDED ATTRIBUTE VALUE BLOCK      | <-------------------------+
+-------------------------------------------|
|                                           |
|         (contents)                        |
|                                           |
+-------------------------------------------+

CHANGES

Mon Jul 24 19:20:05 EDT 2006 jeffm

Tue Jul 25 17:37:43 EDT 2006 jeffm

Tue Jul 25 18:22:46 EDT 2006 jeffm

Wed Jul 26 12:02:25 EDT 2006 jeffm

Wed Jul 26 23:30:27 EDT 2006 jeffm


2011-12-23 01:01