Re: "Corrected" errors persist after scrubbing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Chris,

On 09/05/17 02:26, Chris Murphy wrote:
> Read errors are fixed by overwrites. If the underlying device doesn't
> report an error for the write command, it's assumed to succeed. Even
> md and LVM raid's do this.

I understand assuming writes succeed in general. However, for a tool
which says (in its usage):

"verify checksums of data and metadata"

I would expect that the tool reports the *final state* of the data:

* Because the explicit function of scrub is to verify that all data can
be read.

* In a RAID1 situation, the sysadmin could think that the there are two
valid copies of all data, when in fact there may only be one.

> What are the complete kernel messages for the scrub event? This should
> show what problem Btrfs detects and how it fixes it, and what sectors
> it's fixing each time.

Attached are the kernel logs (and the `grep fixed` version) showing the
same logical blocks are "fixed" twice.

Below (and attached because of formatting) I show how to reproduce the
uncorrected errors, producing the attached logs.

Please reply-all as I'm not subscribed.

----------------------------------------------------

$ pacman -Q btrfs-progs
btrfs-progs 4.10.2-1
$ uname -r
4.9.24-1-MANJARO

# fallocate -l 500M good-disk
# fallocate -l 500M bad-disk
# losetup -f bad-disk # loop0
# losetup -f good-disk # loop1
# mdadm --create -v /dev/md0 --level=linear --force --raid-devices=1
/dev/loop0
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
# mkfs.btrfs -m raid1 -d raid1 /dev/loop1 /dev/md0
btrfs-progs v4.10.2
See http://btrfs.wiki.kernel.org for more information.

Performing full device TRIM /dev/loop1 (500.00MiB) ...
Performing full device TRIM /dev/md0 (499.73MiB) ...
Label:              (null)
UUID:               d748537b-3fa4-47cc-9934-62cf391fb638
Node size:          16384
Sector size:        4096
Filesystem size:    999.73MiB
Block group profiles:
  Data:             RAID1            64.00MiB
  Metadata:         RAID1            49.94MiB
  System:           RAID1             8.00MiB
SSD detected:       no
Incompat features:  extref, skinny-metadata
Number of devices:  2
Devices:
   ID        SIZE  PATH
    1   500.00MiB  /dev/loop1
    2   499.73MiB  /dev/md0
# mount /dev/md0 /mnt/tmp
# dd if=/dev/urandom of=/mnt/tmp/rand bs=1M
dd: error writing '/mnt/tmp/rand': No space left on device
441+0 records in
440+0 records out
461963264 bytes (462 MB, 441 MiB) copied, 5.74358 s, 80.4 MB/s
# umount /mnt/tmp
# mdadm --stop /dev/md0
mdadm: stopped /dev/md0
# mdadm --build /dev/md0 --level=faulty --raid-devices=1 /dev/loop0
mdadm: array /dev/md0 built and started.
    ├─vg_svelte-swap   swap              suspend
0792d327-566d-4306-93f4-810e247827e4   [SWAP]
    └─vg_svelte-rootfs btrfs             root
a1d2a0b4-8d4b-4c1c-9c87-5a55de61918e   /mnt/btrfs-vol/rootfs
# lsblk -f | head -n4
NAME                   FSTYPE            LABEL    UUID
                MOUNTPOINT
loop0                  linux_raid_member svelte:0
f7019aa8-3b2d-4e4c-9106-635d0dddca78
└─md0                  linux_raid_member svelte:0
f7019aa8-3b2d-4e4c-9106-635d0dddca78
  └─md127              btrfs
d748537b-3fa4-47cc-9934-62cf391fb638
# mount /dev/md127 /mnt/tmp
# btrfs scrub start -Bd /mnt/tmp
scrub device /dev/loop1 (id 1) done
        scrub started at Tue May 16 13:04:50 2017 and finished after
00:00:03
        total bytes scrubbed: 441.22MiB with 0 errors
scrub device /dev/md127 (id 2) done
        scrub started at Tue May 16 13:04:50 2017 and finished after
00:00:02
        total bytes scrubbed: 441.22MiB with 0 errors
# # Introduce errors in /dev/md127:
# mdadm --grow /dev/md0 --layout=rp400
layout for /dev/md0 set to 12803
# btrfs scrub start -Bd /mnt/tmp


scrub device /dev/loop1 (id 1) done
        scrub started at Tue May 16 13:07:57 2017 and finished after
00:00:02
        total bytes scrubbed: 441.22MiB with 0 errors
scrub device /dev/md127 (id 2) done
        scrub started at Tue May 16 13:07:57 2017 and finished after
00:00:03
        total bytes scrubbed: 441.22MiB with 19 errors
        error details: read=19
        corrected errors: 19, uncorrectable errors: 0, unverified
errors: 589
WARNING: errors detected during scrubbing, corrected
# # Stop producing additional errors, keeping the already existing
badblocks:
# mdadm --grow /dev/md0 --layout=clear
layout for /dev/md0 set to 31
# # Run scrub for 2nd time:
# btrfs scrub start -Bd /mnt/tmp


scrub device /dev/loop1 (id 1) done
        scrub started at Tue May 16 13:10:19 2017 and finished after
00:00:03
        total bytes scrubbed: 441.22MiB with 0 errors
scrub device /dev/md127 (id 2) done
        scrub started at Tue May 16 13:10:19 2017 and finished after
00:00:03
        total bytes scrubbed: 441.22MiB with 19 errors
        error details: read=19
        corrected errors: 19, uncorrectable errors: 0, unverified
errors: 589
WARNING: errors detected during scrubbing, corrected
# # Clean up
# umount /mnt/tmp
# mdadm --stop /dev/md127
mdadm: stopped /dev/md127
# mdadm --stop /dev/md0
mdadm: stopped /dev/md0
# losetup -d /dev/loop[01]
#

-- 

Tom Hale
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 132972544 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 107016192 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 340852736 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 366804992 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 392626176 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 314896384 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 418050048 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 434962432 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 486998016 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 512688128 on dev /dev/md127
May 16 13:10:20 svelte kernel: BTRFS error (device md127): fixed up error at logical 132972544 on dev /dev/md127
May 16 13:10:20 svelte kernel: BTRFS error (device md127): fixed up error at logical 107016192 on dev /dev/md127
May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 340852736 on dev /dev/md127
May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 366804992 on dev /dev/md127
May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 392626176 on dev /dev/md127
May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 314896384 on dev /dev/md127
May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 418050048 on dev /dev/md127
May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 434962432 on dev /dev/md127
May 16 13:10:22 svelte kernel: BTRFS error (device md127): fixed up error at logical 486998016 on dev /dev/md127
May 16 13:10:22 svelte kernel: BTRFS error (device md127): fixed up error at logical 512688128 on dev /dev/md127
May 16 13:04:30 svelte kernel: BTRFS info (device md127): disk space caching is enabled
May 16 13:04:30 svelte kernel: BTRFS info (device md127): has skinny extents
May 16 13:07:57 svelte kernel: BTRFS warning (device md127): i/o error at logical 132972544 on dev /dev/md127, sector 220800, root 5, inode 261, offset 305774592, length 4096, links 1 (path: rand)
May 16 13:07:58 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
May 16 13:07:58 svelte kernel: BTRFS warning (device md127): i/o error at logical 107016192 on dev /dev/md127, sector 170104, root 5, inode 261, offset 278917120, length 4096, links 1 (path: rand)
May 16 13:07:59 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 132972544 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 107016192 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS warning (device md127): i/o error at logical 340852736 on dev /dev/md127, sector 313984, root 5, inode 261, offset 196018176, length 4096, links 1 (path: rand)
May 16 13:07:59 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 3, flush 0, corrupt 0, gen 0
May 16 13:07:59 svelte kernel: BTRFS warning (device md127): i/o error at logical 366804992 on dev /dev/md127, sector 364672, root 5, inode 261, offset 221970432, length 4096, links 1 (path: rand)
May 16 13:07:59 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 4, flush 0, corrupt 0, gen 0
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 340852736 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS warning (device md127): i/o error at logical 392626176 on dev /dev/md127, sector 415104, root 5, inode 261, offset 247791616, length 4096, links 1 (path: rand)
May 16 13:07:59 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 5, flush 0, corrupt 0, gen 0
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 366804992 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 392626176 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS warning (device md127): i/o error at logical 314896384 on dev /dev/md127, sector 263288, root 5, inode 261, offset 170061824, length 4096, links 1 (path: rand)
May 16 13:07:59 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 6, flush 0, corrupt 0, gen 0
May 16 13:07:59 svelte kernel: BTRFS warning (device md127): i/o error at logical 418050048 on dev /dev/md127, sector 464760, root 5, inode 261, offset 432599040, length 4096, links 1 (path: rand)
May 16 13:07:59 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 7, flush 0, corrupt 0, gen 0
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 314896384 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 418050048 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS warning (device md127): i/o error at logical 434962432 on dev /dev/md127, sector 497792, root 5, inode 261, offset 317849600, length 4096, links 1 (path: rand)
May 16 13:07:59 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 8, flush 0, corrupt 0, gen 0
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 434962432 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS warning (device md127): i/o error at logical 486998016 on dev /dev/md127, sector 599424, root 5, inode 261, offset 369885184, length 4096, links 1 (path: rand)
May 16 13:07:59 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 9, flush 0, corrupt 0, gen 0
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 486998016 on dev /dev/md127
May 16 13:07:59 svelte kernel: BTRFS warning (device md127): i/o error at logical 512688128 on dev /dev/md127, sector 649600, root 5, inode 261, offset 395575296, length 4096, links 1 (path: rand)
May 16 13:07:59 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 10, flush 0, corrupt 0, gen 0
May 16 13:07:59 svelte kernel: BTRFS error (device md127): fixed up error at logical 512688128 on dev /dev/md127
May 16 13:10:20 svelte kernel: BTRFS warning (device md127): i/o error at logical 132972544 on dev /dev/md127, sector 220800, root 5, inode 261, offset 305774592, length 4096, links 1 (path: rand)
May 16 13:10:20 svelte kernel: btrfs_dev_stat_print_on_error: 9 callbacks suppressed
May 16 13:10:20 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 20, flush 0, corrupt 0, gen 0
May 16 13:10:20 svelte kernel: BTRFS warning (device md127): i/o error at logical 107016192 on dev /dev/md127, sector 170104, root 5, inode 261, offset 278917120, length 4096, links 1 (path: rand)
May 16 13:10:20 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 21, flush 0, corrupt 0, gen 0
May 16 13:10:20 svelte kernel: BTRFS error (device md127): fixed up error at logical 132972544 on dev /dev/md127
May 16 13:10:20 svelte kernel: BTRFS error (device md127): fixed up error at logical 107016192 on dev /dev/md127
May 16 13:10:20 svelte kernel: BTRFS warning (device md127): i/o error at logical 340852736 on dev /dev/md127, sector 313984, root 5, inode 261, offset 196018176, length 4096, links 1 (path: rand)
May 16 13:10:21 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 22, flush 0, corrupt 0, gen 0
May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 340852736 on dev /dev/md127
May 16 13:10:21 svelte kernel: BTRFS warning (device md127): i/o error at logical 366804992 on dev /dev/md127, sector 364672, root 5, inode 261, offset 221970432, length 4096, links 1 (path: rand)
May 16 13:10:21 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 23, flush 0, corrupt 0, gen 0
May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 366804992 on dev /dev/md127
May 16 13:10:21 svelte kernel: BTRFS warning (device md127): i/o error at logical 392626176 on dev /dev/md127, sector 415104, root 5, inode 261, offset 247791616, length 4096, links 1 (path: rand)
May 16 13:10:21 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 24, flush 0, corrupt 0, gen 0
May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 392626176 on dev /dev/md127
May 16 13:10:21 svelte kernel: BTRFS warning (device md127): i/o error at logical 314896384 on dev /dev/md127, sector 263288, root 5, inode 261, offset 170061824, length 4096, links 1 (path: rand)
May 16 13:10:21 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 25, flush 0, corrupt 0, gen 0
May 16 13:10:21 svelte kernel: BTRFS warning (device md127): i/o error at logical 418050048 on dev /dev/md127, sector 464760, root 5, inode 261, offset 432599040, length 4096, links 1 (path: rand)
May 16 13:10:21 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 26, flush 0, corrupt 0, gen 0
May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 314896384 on dev /dev/md127
May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 418050048 on dev /dev/md127
May 16 13:10:21 svelte kernel: BTRFS warning (device md127): i/o error at logical 434962432 on dev /dev/md127, sector 497792, root 5, inode 261, offset 317849600, length 4096, links 1 (path: rand)
May 16 13:10:21 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 27, flush 0, corrupt 0, gen 0
May 16 13:10:21 svelte kernel: BTRFS error (device md127): fixed up error at logical 434962432 on dev /dev/md127
May 16 13:10:22 svelte kernel: BTRFS warning (device md127): i/o error at logical 486998016 on dev /dev/md127, sector 599424, root 5, inode 261, offset 369885184, length 4096, links 1 (path: rand)
May 16 13:10:22 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 28, flush 0, corrupt 0, gen 0
May 16 13:10:22 svelte kernel: BTRFS warning (device md127): i/o error at logical 512688128 on dev /dev/md127, sector 649600, root 5, inode 261, offset 395575296, length 4096, links 1 (path: rand)
May 16 13:10:22 svelte kernel: BTRFS error (device md127): bdev /dev/md127 errs: wr 0, rd 29, flush 0, corrupt 0, gen 0
May 16 13:10:22 svelte kernel: BTRFS error (device md127): fixed up error at logical 486998016 on dev /dev/md127
May 16 13:10:22 svelte kernel: BTRFS error (device md127): fixed up error at logical 512688128 on dev /dev/md127
# fallocate -l 500M good-disk
# fallocate -l 500M bad-disk
# losetup -f bad-disk # loop0
# losetup -f good-disk # loop1
# mdadm --create -v /dev/md0 --level=linear --force --raid-devices=1 /dev/loop0
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
# mkfs.btrfs -m raid1 -d raid1 /dev/loop1 /dev/md0
btrfs-progs v4.10.2
See http://btrfs.wiki.kernel.org for more information.

Performing full device TRIM /dev/loop1 (500.00MiB) ...
Performing full device TRIM /dev/md0 (499.73MiB) ...
Label:              (null)
UUID:               d748537b-3fa4-47cc-9934-62cf391fb638
Node size:          16384
Sector size:        4096
Filesystem size:    999.73MiB
Block group profiles:
  Data:             RAID1            64.00MiB
  Metadata:         RAID1            49.94MiB
  System:           RAID1             8.00MiB
SSD detected:       no
Incompat features:  extref, skinny-metadata
Number of devices:  2
Devices:
   ID        SIZE  PATH
    1   500.00MiB  /dev/loop1
    2   499.73MiB  /dev/md0
# mount /dev/md0 /mnt/tmp
# dd if=/dev/urandom of=/mnt/tmp/rand bs=1M
dd: error writing '/mnt/tmp/rand': No space left on device
441+0 records in
440+0 records out
461963264 bytes (462 MB, 441 MiB) copied, 5.74358 s, 80.4 MB/s
# umount /mnt/tmp
# mdadm --stop /dev/md0
mdadm: stopped /dev/md0
# mdadm --build /dev/md0 --level=faulty --raid-devices=1 /dev/loop0
mdadm: array /dev/md0 built and started.
    ├─vg_svelte-swap   swap              suspend  0792d327-566d-4306-93f4-810e247827e4   [SWAP]
    └─vg_svelte-rootfs btrfs             root     a1d2a0b4-8d4b-4c1c-9c87-5a55de61918e   /mnt/btrfs-vol/rootfs
# lsblk -f | head -n4
NAME                   FSTYPE            LABEL    UUID                                   MOUNTPOINT
loop0                  linux_raid_member svelte:0 f7019aa8-3b2d-4e4c-9106-635d0dddca78
└─md0                  linux_raid_member svelte:0 f7019aa8-3b2d-4e4c-9106-635d0dddca78
  └─md127              btrfs                      d748537b-3fa4-47cc-9934-62cf391fb638
# mount /dev/md127 /mnt/tmp
# btrfs scrub start -Bd /mnt/tmp
scrub device /dev/loop1 (id 1) done
        scrub started at Tue May 16 13:04:50 2017 and finished after 00:00:03
        total bytes scrubbed: 441.22MiB with 0 errors
scrub device /dev/md127 (id 2) done
        scrub started at Tue May 16 13:04:50 2017 and finished after 00:00:02
        total bytes scrubbed: 441.22MiB with 0 errors
# # Introduce errors in /dev/md127:
# mdadm --grow /dev/md0 --layout=rp400
layout for /dev/md0 set to 12803
# btrfs scrub start -Bd /mnt/tmp                                                                                                                                                              
scrub device /dev/loop1 (id 1) done
        scrub started at Tue May 16 13:07:57 2017 and finished after 00:00:02
        total bytes scrubbed: 441.22MiB with 0 errors
scrub device /dev/md127 (id 2) done
        scrub started at Tue May 16 13:07:57 2017 and finished after 00:00:03
        total bytes scrubbed: 441.22MiB with 19 errors
        error details: read=19
        corrected errors: 19, uncorrectable errors: 0, unverified errors: 589
WARNING: errors detected during scrubbing, corrected
# # Stop producing additional errors, keeping the already existing badblocks:
# mdadm --grow /dev/md0 --layout=clear
layout for /dev/md0 set to 31
# # Run scrub for 2nd time:
# btrfs scrub start -Bd /mnt/tmp                                                                                                                                                              
scrub device /dev/loop1 (id 1) done
        scrub started at Tue May 16 13:10:19 2017 and finished after 00:00:03
        total bytes scrubbed: 441.22MiB with 0 errors
scrub device /dev/md127 (id 2) done
        scrub started at Tue May 16 13:10:19 2017 and finished after 00:00:03
        total bytes scrubbed: 441.22MiB with 19 errors
        error details: read=19
        corrected errors: 19, uncorrectable errors: 0, unverified errors: 589
WARNING: errors detected during scrubbing, corrected
# # Clean up
# umount /mnt/tmp
# mdadm --stop /dev/md127
mdadm: stopped /dev/md127
# mdadm --stop /dev/md0
mdadm: stopped /dev/md0
# losetup -d /dev/loop[01]
# rm good-disk bad-disk


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux