Project: Linux Data Integrity Project
Project Description:
|
A framework for proactive data integrity protection in Linux.
|
License:
|
GPL
|
Please note that as of 2.6.27 the data integrity extensions were
merged into the Linux kernel. As a consequence all further
development is done in the upstream kernel. This project site
currently functions as a repository for design documents and
presentations on DIF and DIX.
For inquiries about the Data Integrity Extensions or the Linux
implementation please contact: Martin K. Petersen
<martin.petersen@oracle.com>.
Many common causes of data corruption are not caused by bit rot on
the physical disk platter but rather due to bugs in the I/O path
between application and drive.
Modern filesystems - including Oracle's
own btrfs - implement checksumming so
that corrupted data can be detected. This detection occurs when
data is read back, however, which can potentially be months after
the corrupted data was written. And chances are that the good data
was lost forever. The Data Integrity Initiative aims to prevent
corrupted data buffers from being written to disk.
Common corruptions scenarios are:
- bad buffer writes - write ends up the right place on disk, but
the data written is not what the application sent
- misdirected writes - the write buffer contains good data but ends
up being written to the wrong location on disk
The storage industry has been aware of this for many years, and many
array vendors have been leveraging the support for 520 and 528 byte
sectors on SCSI-family (SPI/FC/SAS) drives which allow for extra
protection information to be stored along with the user's data.
However, this extra information is proprietary and available inside
the storage array only.
An addition to the SCSI specification called the Data Integrity
Field standardizes the contents of the protection data, and allows
the extra information to be sent and received from the host
controller, as well as verified along the chain of devices.
With industry partners Oracle has developed an infrastructure that
takes the DIF specification a step further, allowing the protection
metadata to be exposed to the operating system as well as the
application.
The Linux data integrity framework enables applications or kernel
subsystems to attach metadata to I/O operations, allowing devices
that support DIF to verify the integrity before passing them further
down the stack and physically committing them to disk.
The hardware feature that enables exchange of protection metadata
between host operating system and HBA is called the Data Integrity
Extensions or DIX. The DIX definition can be found in the docs
section.
|