On Thu, Feb 14, 2019 at 03:17:20PM +0000, fdmanana@xxxxxxxxxx wrote:
> From: Filipe Manana <fdmanana@xxxxxxxx>
>
> In the past we had data corruption when reading compressed extents that
> are shared within the same file and they are consecutive, this got fixed
> by commit 005efedf2c7d0 ("Btrfs: fix read corruption of compressed and
> shared extents") and by commit 808f80b46790f ("Btrfs: update fix for read
> corruption of compressed and shared extents"). However there was a case
> that was missing in those fixes, which is when the shared and compressed
> extents are referenced with a non-zero offset. The following shell script
> creates a reproducer for this issue:
>
> #!/bin/bash
>
> mkfs.btrfs -f /dev/sdc &> /dev/null
> mount -o compress /dev/sdc /mnt/sdc
>
> # Create a file with 3 consecutive compressed extents, each has an
> # uncompressed size of 128Kb and a compressed size of 4Kb.
> for ((i = 1; i <= 3; i++)); do
> head -c 4096 /dev/zero
> for ((j = 1; j <= 31; j++)); do
> head -c 4096 /dev/zero | tr '\0' "\377"
> done
> done > /mnt/sdc/foobar
> sync
>
> echo "Digest after file creation: $(md5sum /mnt/sdc/foobar)"
>
> # Clone the first extent into offsets 128K and 256K.
> xfs_io -c "reflink /mnt/sdc/foobar 0 128K 128K" /mnt/sdc/foobar
> xfs_io -c "reflink /mnt/sdc/foobar 0 256K 128K" /mnt/sdc/foobar
> sync
>
> echo "Digest after cloning: $(md5sum /mnt/sdc/foobar)"
>
> # Punch holes into the regions that are already full of zeroes.
> xfs_io -c "fpunch 0 4K" /mnt/sdc/foobar
> xfs_io -c "fpunch 128K 4K" /mnt/sdc/foobar
> xfs_io -c "fpunch 256K 4K" /mnt/sdc/foobar
> sync
>
> echo "Digest after hole punching: $(md5sum /mnt/sdc/foobar)"
>
> echo "Dropping page cache..."
> sysctl -q vm.drop_caches=1
> echo "Digest after hole punching: $(md5sum /mnt/sdc/foobar)"
>
> umount /dev/sdc
>
> When running the script we get the following output:
>
> Digest after file creation: 5a0888d80d7ab1fd31c229f83a3bbcc8 /mnt/sdc/foobar
> linked 131072/131072 bytes at offset 131072
> 128 KiB, 1 ops; 0.0033 sec (36.960 MiB/sec and 295.6830 ops/sec)
> linked 131072/131072 bytes at offset 262144
> 128 KiB, 1 ops; 0.0015 sec (78.567 MiB/sec and 628.5355 ops/sec)
> Digest after cloning: 5a0888d80d7ab1fd31c229f83a3bbcc8 /mnt/sdc/foobar
> Digest after hole punching: 5a0888d80d7ab1fd31c229f83a3bbcc8 /mnt/sdc/foobar
> Dropping page cache...
> Digest after hole punching: fba694ae8664ed0c2e9ff8937e7f1484 /mnt/sdc/foobar
>
> This happens because after reading all the pages of the extent in the
> range from 128K to 256K for example, we read the hole at offset 256K
> and then when reading the page at offset 260K we don't submit the
> existing bio, which is responsible for filling all the page in the
> range 128K to 256K only, therefore adding the pages from range 260K
> to 384K to the existing bio and submitting it after iterating over the
> entire range. Once the bio completes, the uncompressed data fills only
> the pages in the range 128K to 256K because there's no more data read
> from disk, leaving the pages in the range 260K to 384K unfilled. It is
> just a slightly different variant of what was solved by commit
> 005efedf2c7d0 ("Btrfs: fix read corruption of compressed and shared
> extents").
>
> Fix this by forcing a bio submit, during readpages(), whenever we find a
> compressed extent map for a page that is different from the extent map
> for the previous page or has a different starting offset (in case it's
> the same compressed extent), instead of the extent map's original start
> offset.
>
> A test case for fstests follows soon.
>
> Reported-by: Zygo Blaxell <ce3g8jdj@xxxxxxxxxxxxxxxxxxxxx>
> Tested-by: Zygo Blaxell <ce3g8jdj@xxxxxxxxxxxxxxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx # 4.3+
> Signed-off-by: Filipe Manana <fdmanana@xxxxxxxx>
Added to misc-next and scheduled for 5.1, thanks. And thanks to Zygo for
the report.