Hi Chris, On 09/05/17 02:26, Chris Murphy wrote: > Read errors are fixed by overwrites. If the underlying device doesn't > report an error for the write command, it's assumed to succeed. Even > md and LVM raid's do this. I understand assuming writes succeed in general. However, for a tool which says (in its usage): "verify checksums of data and metadata" I would expect that the tool reports the *final state* of the data: * Because the explicit function of scrub is to verify that all data can be read. * In a RAID1 situation, the sysadmin could think that the there are two valid copies of all data, when in fact there may only be one. > What are the complete kernel messages for the scrub event? This should > show what problem Btrfs detects and how it fixes it, and what sectors > it's fixing each time. Attached are the kernel logs (and the `grep fixed` version) showing the same logical blocks are "fixed" twice. Below (and attached because of formatting) I show how to reproduce the uncorrected errors, producing the attached logs. Please reply-all as I'm not subscribed. ---------------------------------------------------- $ pacman -Q btrfs-progs btrfs-progs 4.10.2-1 $ uname -r 4.9.24-1-MANJARO # fallocate -l 500M good-disk # fallocate -l 500M bad-disk # losetup -f bad-disk # loop0 # losetup -f good-disk # loop1 # mdadm --create -v /dev/md0 --level=linear --force --raid-devices=1 /dev/loop0 mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md0 started. # mkfs.btrfs -m raid1 -d raid1 /dev/loop1 /dev/md0 btrfs-progs v4.10.2 See http://btrfs.wiki.kernel.org for more information. Performing full device TRIM /dev/loop1 (500.00MiB) ... Performing full device TRIM /dev/md0 (499.73MiB) ... Label: (null) UUID: d748537b-3fa4-47cc-9934-62cf391fb638 Node size: 16384 Sector size: 4096 Filesystem size: 999.73MiB Block group profiles: Data: RAID1 64.00MiB Metadata: RAID1 49.94MiB System: RAID1 8.00MiB SSD detected: no Incompat features: extref, skinny-metadata Number of devices: 2 Devices: ID SIZE PATH 1 500.00MiB /dev/loop1 2 499.73MiB /dev/md0 # mount /dev/md0 /mnt/tmp # dd if=/dev/urandom of=/mnt/tmp/rand bs=1M dd: error writing '/mnt/tmp/rand': No space left on device 441+0 records in 440+0 records out 461963264 bytes (462 MB, 441 MiB) copied, 5.74358 s, 80.4 MB/s # umount /mnt/tmp # mdadm --stop /dev/md0 mdadm: stopped /dev/md0 # mdadm --build /dev/md0 --level=faulty --raid-devices=1 /dev/loop0 mdadm: array /dev/md0 built and started. ├─vg_svelte-swap swap suspend 0792d327-566d-4306-93f4-810e247827e4 [SWAP] └─vg_svelte-rootfs btrfs root a1d2a0b4-8d4b-4c1c-9c87-5a55de61918e /mnt/btrfs-vol/rootfs # lsblk -f | head -n4 NAME FSTYPE LABEL UUID MOUNTPOINT loop0 linux_raid_member svelte:0 f7019aa8-3b2d-4e4c-9106-635d0dddca78 └─md0 linux_raid_member svelte:0 f7019aa8-3b2d-4e4c-9106-635d0dddca78 └─md127 btrfs d748537b-3fa4-47cc-9934-62cf391fb638 # mount /dev/md127 /mnt/tmp # btrfs scrub start -Bd /mnt/tmp scrub device /dev/loop1 (id 1) done scrub started at Tue May 16 13:04:50 2017 and finished after 00:00:03 total bytes scrubbed: 441.22MiB with 0 errors scrub device /dev/md127 (id 2) done scrub started at Tue May 16 13:04:50 2017 and finished after 00:00:02 total bytes scrubbed: 441.22MiB with 0 errors # # Introduce errors in /dev/md127: # mdadm --grow /dev/md0 --layout=rp400 layout for /dev/md0 set to 12803 # btrfs scrub start -Bd /mnt/tmp scrub device /dev/loop1 (id 1) done scrub started at Tue May 16 13:07:57 2017 and finished after 00:00:02 total bytes scrubbed: 441.22MiB with 0 errors scrub device /dev/md127 (id 2) done scrub started at Tue May 16 13:07:57 2017 and finished after 00:00:03 total bytes scrubbed: 441.22MiB with 19 errors error details: read=19 corrected errors: 19, uncorrectable errors: 0, unverified errors: 589 WARNING: errors detected during scrubbing, corrected # # Stop producing additional errors, keeping the already existing badblocks: # mdadm --grow /dev/md0 --layout=clear layout for /dev/md0 set to 31 # # Run scrub for 2nd time: # btrfs scrub start -Bd /mnt/tmp scrub device /dev/loop1 (id 1) done scrub started at Tue May 16 13:10:19 2017 and finished after 00:00:03 total bytes scrubbed: 441.22MiB with 0 errors scrub device /dev/md127 (id 2) done scrub started at Tue May 16 13:10:19 2017 and finished after 00:00:03 total bytes scrubbed: 441.22MiB with 19 errors error details: read=19 corrected errors: 19, uncorrectable errors: 0, unverified errors: 589 WARNING: errors detected during scrubbing, corrected # # Clean up # umount /mnt/tmp # mdadm --stop /dev/md127 mdadm: stopped /dev/md127 # mdadm --stop /dev/md0 mdadm: stopped /dev/md0 # losetup -d /dev/loop[01] # -- Tom Hale
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 132972544 on dev /dev/md127 May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 107016192 on dev /dev/md127 May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 340852736 on dev /dev/md127 May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 366804992 on dev /dev/md127 May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 392626176 on dev /dev/md127 May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 314896384 on dev /dev/md127 May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 418050048 on dev /dev/md127 May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 434962432 on dev /dev/md127 May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 486998016 on dev /dev/md127 May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 512688128 on dev /dev/md127 May 16 13:10:20 svelte kernel: BTRFS error (device md127): fixed up error at logical 132972544 on dev /dev/md127 May 16 13:10:20 svelte kernel: BTRFS error (device md127): fixed up error at logical 107016192 on dev /dev/md127 May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 340852736 on dev /dev/md127 May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 366804992 on dev /dev/md127 May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 392626176 on dev /dev/md127 May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 314896384 on dev /dev/md127 May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 418050048 on dev /dev/md127 May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 434962432 on dev /dev/md127 May 16 13:10:22 svelte kernel: BTRFS error (device md127): fixed up error at logical 486998016 on dev /dev/md127 May 16 13:10:22 svelte kernel: BTRFS error (device md127): fixed up error at logical 512688128 on dev /dev/md127
May 16 13:04:30 svelte kernel: BTRFS info (device md127): disk space caching is enabled May 16 13:04:30 svelte kernel: BTRFS info (device md127): has skinny extents May 16 13:07:57 svelte kernel: BTRFS warning (device md127): i/o error at logical 132972544 on dev /dev/md127, sector 220800, root 5, inode 261, offset 305774592, length 4096, links 1 (path: rand) May 16 13:07:58 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0 May 16 13:07:58 svelte kernel: BTRFS warning (device md127): i/o error at logical 107016192 on dev /dev/md127, sector 170104, root 5, inode 261, offset 278917120, length 4096, links 1 (path: rand) May 16 13:07:59 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 2, flush 0, corrupt 0, gen 0 May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 132972544 on dev /dev/md127 May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 107016192 on dev /dev/md127 May 16 13:07:59 svelte kernel: BTRFS warning (device md127): i/o error at logical 340852736 on dev /dev/md127, sector 313984, root 5, inode 261, offset 196018176, length 4096, links 1 (path: rand) May 16 13:07:59 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 3, flush 0, corrupt 0, gen 0 May 16 13:07:59 svelte kernel: BTRFS warning (device md127): i/o error at logical 366804992 on dev /dev/md127, sector 364672, root 5, inode 261, offset 221970432, length 4096, links 1 (path: rand) May 16 13:07:59 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 4, flush 0, corrupt 0, gen 0 May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 340852736 on dev /dev/md127 May 16 13:07:59 svelte kernel: BTRFS warning (device md127): i/o error at logical 392626176 on dev /dev/md127, sector 415104, root 5, inode 261, offset 247791616, length 4096, links 1 (path: rand) May 16 13:07:59 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 5, flush 0, corrupt 0, gen 0 May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 366804992 on dev /dev/md127 May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 392626176 on dev /dev/md127 May 16 13:07:59 svelte kernel: BTRFS warning (device md127): i/o error at logical 314896384 on dev /dev/md127, sector 263288, root 5, inode 261, offset 170061824, length 4096, links 1 (path: rand) May 16 13:07:59 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 6, flush 0, corrupt 0, gen 0 May 16 13:07:59 svelte kernel: BTRFS warning (device md127): i/o error at logical 418050048 on dev /dev/md127, sector 464760, root 5, inode 261, offset 432599040, length 4096, links 1 (path: rand) May 16 13:07:59 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 7, flush 0, corrupt 0, gen 0 May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 314896384 on dev /dev/md127 May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 418050048 on dev /dev/md127 May 16 13:07:59 svelte kernel: BTRFS warning (device md127): i/o error at logical 434962432 on dev /dev/md127, sector 497792, root 5, inode 261, offset 317849600, length 4096, links 1 (path: rand) May 16 13:07:59 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 8, flush 0, corrupt 0, gen 0 May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 434962432 on dev /dev/md127 May 16 13:07:59 svelte kernel: BTRFS warning (device md127): i/o error at logical 486998016 on dev /dev/md127, sector 599424, root 5, inode 261, offset 369885184, length 4096, links 1 (path: rand) May 16 13:07:59 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 9, flush 0, corrupt 0, gen 0 May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 486998016 on dev /dev/md127 May 16 13:07:59 svelte kernel: BTRFS warning (device md127): i/o error at logical 512688128 on dev /dev/md127, sector 649600, root 5, inode 261, offset 395575296, length 4096, links 1 (path: rand) May 16 13:07:59 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 10, flush 0, corrupt 0, gen 0 May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 512688128 on dev /dev/md127 May 16 13:10:20 svelte kernel: BTRFS warning (device md127): i/o error at logical 132972544 on dev /dev/md127, sector 220800, root 5, inode 261, offset 305774592, length 4096, links 1 (path: rand) May 16 13:10:20 svelte kernel: btrfs_dev_stat_print_on_error: 9 callbacks suppressed May 16 13:10:20 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 20, flush 0, corrupt 0, gen 0 May 16 13:10:20 svelte kernel: BTRFS warning (device md127): i/o error at logical 107016192 on dev /dev/md127, sector 170104, root 5, inode 261, offset 278917120, length 4096, links 1 (path: rand) May 16 13:10:20 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 21, flush 0, corrupt 0, gen 0 May 16 13:10:20 svelte kernel: BTRFS error (device md127): fixed up error at logical 132972544 on dev /dev/md127 May 16 13:10:20 svelte kernel: BTRFS error (device md127): fixed up error at logical 107016192 on dev /dev/md127 May 16 13:10:20 svelte kernel: BTRFS warning (device md127): i/o error at logical 340852736 on dev /dev/md127, sector 313984, root 5, inode 261, offset 196018176, length 4096, links 1 (path: rand) May 16 13:10:21 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 22, flush 0, corrupt 0, gen 0 May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 340852736 on dev /dev/md127 May 16 13:10:21 svelte kernel: BTRFS warning (device md127): i/o error at logical 366804992 on dev /dev/md127, sector 364672, root 5, inode 261, offset 221970432, length 4096, links 1 (path: rand) May 16 13:10:21 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 23, flush 0, corrupt 0, gen 0 May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 366804992 on dev /dev/md127 May 16 13:10:21 svelte kernel: BTRFS warning (device md127): i/o error at logical 392626176 on dev /dev/md127, sector 415104, root 5, inode 261, offset 247791616, length 4096, links 1 (path: rand) May 16 13:10:21 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 24, flush 0, corrupt 0, gen 0 May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 392626176 on dev /dev/md127 May 16 13:10:21 svelte kernel: BTRFS warning (device md127): i/o error at logical 314896384 on dev /dev/md127, sector 263288, root 5, inode 261, offset 170061824, length 4096, links 1 (path: rand) May 16 13:10:21 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 25, flush 0, corrupt 0, gen 0 May 16 13:10:21 svelte kernel: BTRFS warning (device md127): i/o error at logical 418050048 on dev /dev/md127, sector 464760, root 5, inode 261, offset 432599040, length 4096, links 1 (path: rand) May 16 13:10:21 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 26, flush 0, corrupt 0, gen 0 May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 314896384 on dev /dev/md127 May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 418050048 on dev /dev/md127 May 16 13:10:21 svelte kernel: BTRFS warning (device md127): i/o error at logical 434962432 on dev /dev/md127, sector 497792, root 5, inode 261, offset 317849600, length 4096, links 1 (path: rand) May 16 13:10:21 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 27, flush 0, corrupt 0, gen 0 May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 434962432 on dev /dev/md127 May 16 13:10:22 svelte kernel: BTRFS warning (device md127): i/o error at logical 486998016 on dev /dev/md127, sector 599424, root 5, inode 261, offset 369885184, length 4096, links 1 (path: rand) May 16 13:10:22 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 28, flush 0, corrupt 0, gen 0 May 16 13:10:22 svelte kernel: BTRFS warning (device md127): i/o error at logical 512688128 on dev /dev/md127, sector 649600, root 5, inode 261, offset 395575296, length 4096, links 1 (path: rand) May 16 13:10:22 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 29, flush 0, corrupt 0, gen 0 May 16 13:10:22 svelte kernel: BTRFS error (device md127): fixed up error at logical 486998016 on dev /dev/md127 May 16 13:10:22 svelte kernel: BTRFS error (device md127): fixed up error at logical 512688128 on dev /dev/md127
# fallocate -l 500M good-disk # fallocate -l 500M bad-disk # losetup -f bad-disk # loop0 # losetup -f good-disk # loop1 # mdadm --create -v /dev/md0 --level=linear --force --raid-devices=1 /dev/loop0 mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md0 started. # mkfs.btrfs -m raid1 -d raid1 /dev/loop1 /dev/md0 btrfs-progs v4.10.2 See http://btrfs.wiki.kernel.org for more information. Performing full device TRIM /dev/loop1 (500.00MiB) ... Performing full device TRIM /dev/md0 (499.73MiB) ... Label: (null) UUID: d748537b-3fa4-47cc-9934-62cf391fb638 Node size: 16384 Sector size: 4096 Filesystem size: 999.73MiB Block group profiles: Data: RAID1 64.00MiB Metadata: RAID1 49.94MiB System: RAID1 8.00MiB SSD detected: no Incompat features: extref, skinny-metadata Number of devices: 2 Devices: ID SIZE PATH 1 500.00MiB /dev/loop1 2 499.73MiB /dev/md0 # mount /dev/md0 /mnt/tmp # dd if=/dev/urandom of=/mnt/tmp/rand bs=1M dd: error writing '/mnt/tmp/rand': No space left on device 441+0 records in 440+0 records out 461963264 bytes (462 MB, 441 MiB) copied, 5.74358 s, 80.4 MB/s # umount /mnt/tmp # mdadm --stop /dev/md0 mdadm: stopped /dev/md0 # mdadm --build /dev/md0 --level=faulty --raid-devices=1 /dev/loop0 mdadm: array /dev/md0 built and started. ├─vg_svelte-swap swap suspend 0792d327-566d-4306-93f4-810e247827e4 [SWAP] └─vg_svelte-rootfs btrfs root a1d2a0b4-8d4b-4c1c-9c87-5a55de61918e /mnt/btrfs-vol/rootfs # lsblk -f | head -n4 NAME FSTYPE LABEL UUID MOUNTPOINT loop0 linux_raid_member svelte:0 f7019aa8-3b2d-4e4c-9106-635d0dddca78 └─md0 linux_raid_member svelte:0 f7019aa8-3b2d-4e4c-9106-635d0dddca78 └─md127 btrfs d748537b-3fa4-47cc-9934-62cf391fb638 # mount /dev/md127 /mnt/tmp # btrfs scrub start -Bd /mnt/tmp scrub device /dev/loop1 (id 1) done scrub started at Tue May 16 13:04:50 2017 and finished after 00:00:03 total bytes scrubbed: 441.22MiB with 0 errors scrub device /dev/md127 (id 2) done scrub started at Tue May 16 13:04:50 2017 and finished after 00:00:02 total bytes scrubbed: 441.22MiB with 0 errors # # Introduce errors in /dev/md127: # mdadm --grow /dev/md0 --layout=rp400 layout for /dev/md0 set to 12803 # btrfs scrub start -Bd /mnt/tmp scrub device /dev/loop1 (id 1) done scrub started at Tue May 16 13:07:57 2017 and finished after 00:00:02 total bytes scrubbed: 441.22MiB with 0 errors scrub device /dev/md127 (id 2) done scrub started at Tue May 16 13:07:57 2017 and finished after 00:00:03 total bytes scrubbed: 441.22MiB with 19 errors error details: read=19 corrected errors: 19, uncorrectable errors: 0, unverified errors: 589 WARNING: errors detected during scrubbing, corrected # # Stop producing additional errors, keeping the already existing badblocks: # mdadm --grow /dev/md0 --layout=clear layout for /dev/md0 set to 31 # # Run scrub for 2nd time: # btrfs scrub start -Bd /mnt/tmp scrub device /dev/loop1 (id 1) done scrub started at Tue May 16 13:10:19 2017 and finished after 00:00:03 total bytes scrubbed: 441.22MiB with 0 errors scrub device /dev/md127 (id 2) done scrub started at Tue May 16 13:10:19 2017 and finished after 00:00:03 total bytes scrubbed: 441.22MiB with 19 errors error details: read=19 corrected errors: 19, uncorrectable errors: 0, unverified errors: 589 WARNING: errors detected during scrubbing, corrected # # Clean up # umount /mnt/tmp # mdadm --stop /dev/md127 mdadm: stopped /dev/md127 # mdadm --stop /dev/md0 mdadm: stopped /dev/md0 # losetup -d /dev/loop[01] # rm good-disk bad-disk
