There are at least 2 reports about memory bit flip sneaking into on-disk
data.
Currently we only have a relaxed check triggered at
btrfs_mark_buffer_dirty() time, as it's not mandatory, only for
CONFIG_BTRFS_FS_CHECK_INTEGRITY enabled build.
This patch will address the hole by triggering comprehensive check on
tree blocks before writing it back to disk.
The timing is set to csum_tree_block() where @verify == 0.
At that timing, we're generation csum for tree blocks before submitting
the metadata bio, so we could avoid all the unnecessary calls at
btrfs_mark_buffer_dirty(), but still catch enough error.
The example error output will be something like:
BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
BTRFS error (device dm-3): write time tree block corruption detected
BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
BTRFS error (device dm-3): write time tree block corruption detected
BTRFS: error (device dm-3) in btrfs_commit_transaction:2220: errno=-5 IO failure (Error while writing out transaction)
BTRFS info (device dm-3): forced readonly
BTRFS warning (device dm-3): Skipping commit of aborted transaction.
BTRFS: error (device dm-3) in cleanup_transaction:1839: errno=-5 IO failure
BTRFS info (device dm-3): delayed_refs has NO entry
Reported-by: Leonard Lausen <leonard@xxxxxxxxx>
Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
---
fs/btrfs/disk-io.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index bc2379cb2091..d95716847870 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -313,6 +313,15 @@ static int csum_tree_block(struct btrfs_fs_info *fs_info,
return -EUCLEAN;
}
} else {
+ if (btrfs_header_level(buf))
+ err = btrfs_check_node(fs_info, buf);
+ else
+ err = btrfs_check_leaf_full(fs_info, buf);
+ if (err < 0) {
+ btrfs_err(fs_info,
+ "write time tree block corruption detected");
+ return err;
+ }
write_extent_buffer(buf, result, 0, csum_size);
}
--
2.20.1