Project: Linux Data Integrity Project

[ Project Home  |  News  |  Downloads  |  Docs  |  Mailing Lists  |  Source Control  |  Issues ]

Project Description: A framework for proactive data integrity protection in Linux.
License: GPL

Please note that as of 2.6.27 the data integrity extensions were merged into the Linux kernel. As a consequence all further development is done in the upstream kernel. This project site currently functions as a repository for design documents and presentations on DIF and DIX.

For inquiries about the Data Integrity Extensions or the Linux implementation please contact: Martin K. Petersen <martin.petersen@oracle.com>.


Many common causes of data corruption are not caused by bit rot on the physical disk platter but rather due to bugs in the I/O path between application and drive.

Modern filesystems - including Oracle's own btrfs - implement checksumming so that corrupted data can be detected. This detection occurs when data is read back, however, which can potentially be months after the corrupted data was written. And chances are that the good data was lost forever. The Data Integrity Initiative aims to prevent corrupted data buffers from being written to disk.

Common corruptions scenarios are:

  • bad buffer writes - write ends up the right place on disk, but the data written is not what the application sent
  • misdirected writes - the write buffer contains good data but ends up being written to the wrong location on disk

The storage industry has been aware of this for many years, and many array vendors have been leveraging the support for 520 and 528 byte sectors on SCSI-family (SPI/FC/SAS) drives which allow for extra protection information to be stored along with the user's data. However, this extra information is proprietary and available inside the storage array only.

An addition to the SCSI specification called the Data Integrity Field standardizes the contents of the protection data, and allows the extra information to be sent and received from the host controller, as well as verified along the chain of devices.

With industry partners Oracle has developed an infrastructure that takes the DIF specification a step further, allowing the protection metadata to be exposed to the operating system as well as the application.

The Linux data integrity framework enables applications or kernel subsystems to attach metadata to I/O operations, allowing devices that support DIF to verify the integrity before passing them further down the stack and physically committing them to disk.

The hardware feature that enables exchange of protection metadata between host operating system and HBA is called the Data Integrity Extensions or DIX. The DIX definition can be found in the docs section.