Re: 12 TB btrfs file system on virtual machine broke again (third time)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jan 11, 2020 at 10:23 AM Christian Wimmer
<telefonchris@xxxxxxxxxx> wrote:
>
> Hi guys,
>
> here the last lines before the suspend and after it:
>
>
> 66939.439945] Buffer I/O error on dev loop1, logical block 0, async page read
> [66939.621686] print_req_error: I/O error, dev loop1, sector 83885952
> [66939.621738] print_req_error: I/O error, dev loop1, sector 83885952
> [66939.621740] Buffer I/O error on dev loop1, logical block 10485744, async page read

I don't know what /dev/loop1 is or the origin of the i/o error, but
I'm gonna guess it's a problem with the underlying device, or a bug
related to either the VM or the host or guest kernels or some
interaction of the three.

> [509652.385601] sd 6:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [509652.385623] sd 6:0:0:0: [sdb] tag#0 Sense Key : Illegal Request [current]
> [509652.385628] sd 6:0:0:0: [sdb] tag#0 Add. Sense: Invalid command operation code
> [509652.385631] sd 6:0:0:0: [sdb] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 16 b2 c0 00 00 00 20 00 00
> [509652.385634] print_req_error: critical target error, dev sdb, sector 1487552
> [509652.385642] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
> [509652.386117] xhci_hcd 0000:00:1d.6: Mismatch between completed Set TR Deq Ptr command & xHCI internal state.
> [509652.386120] xhci_hcd 0000:00:1d.6: ep deq seg = ffff880bd3fbab40, deq ptr = ffff880bd3b65150
> [509652.386132] sd 6:0:0:0: [sdb] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [509652.386135] sd 6:0:0:0: [sdb] tag#1 Sense Key : Illegal Request [current]
> [509652.386137] sd 6:0:0:0: [sdb] tag#1 Add. Sense: Invalid command operation code
> [509652.386139] sd 6:0:0:0: [sdb] tag#1 CDB: Write(16) 8a 00 00 00 00 00 01 1d 9a 18 00 00 02 00 00 00
> [509652.386141] print_req_error: critical target error, dev sdb, sector 18717208
> [509652.386158] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 1, rd 1, flush 0, corrupt 0, gen 0
> [509682.560535] xhci_hcd 0000:00:1d.6: Mismatch between completed Set TR Deq Ptr command & xHCI internal state.
> [509682.560537] xhci_hcd 0000:00:1d.6: ep deq seg = ffff880bd3fbab40, deq ptr = ffff880bd3b65190
> [509682.560566] sd 6:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [509682.560567] sd 6:0:0:0: [sdb] tag#0 Sense Key : Illegal Request [current]
> [509682.560569] sd 6:0:0:0: [sdb] tag#0 Add. Sense: Invalid command operation code
> [509682.560571] sd 6:0:0:0: [sdb] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 36 b2 c0 00 00 00 20 00 00
> [509682.560572] print_req_error: critical target error, dev sdb, sector 3584704
> [509682.560579] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 1, rd 2, flush 0, corrupt 0, gen 0
> [509682.560667] BTRFS: error (device sdb1) in btrfs_start_dirty_block_groups:3716: errno=-5 IO failure
> [509682.560669] BTRFS info (device sdb1): forced readonly
> [509712.672325] xhci_hcd 0000:00:1d.6: Mismatch between completed Set TR Deq Ptr command & xHCI internal state.
> [509712.672330] xhci_hcd 0000:00:1d.6: ep deq seg = ffff880bd3fbab40, deq ptr = ffff880bd3b651d0
> [509712.672373] sd 6:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [509712.672377] sd 6:0:0:0: [sdb] tag#0 Sense Key : Illegal Request [current]
> [509712.672379] sd 6:0:0:0: [sdb] tag#0 Add. Sense: Invalid command operation code
> [509712.672382] sd 6:0:0:0: [sdb] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 16 b2 c0 00 00 00 20 00 00
> [509712.672384] print_req_error: critical target error, dev sdb, sector 1487552
> [509712.672388] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 1, rd 3, flush 0, corrupt 0, gen 0
> [509742.783978] xhci_hcd 0000:00:1d.6: Mismatch between completed Set TR Deq Ptr command & xHCI internal state.
> [509742.783982] xhci_hcd 0000:00:1d.6: ep deq seg = ffff880bd3fbab40, deq ptr = ffff880bd3b65210
> [509742.784016] sd 6:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [509742.784019] sd 6:0:0:0: [sdb] tag#0 Sense Key : Illegal Request [current]
> [509742.784021] sd 6:0:0:0: [sdb] tag#0 Add. Sense: Invalid command operation code
> [509742.784024] sd 6:0:0:0: [sdb] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 36 b2 c0 00 00 00 20 00 00
> [509742.784026] print_req_error: critical target error, dev sdb, sector 3584704
> [509742.784031] BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 1, rd 4, flush 0, corrupt 0, gen 0
> [509742.784079] BTRFS: error (device sdb1) in btrfs_start_dirty_block_groups:3716: errno=-5 IO failure
> [509742.784082] BTRFS warning (device sdb1): Skipping commit of aborted transaction.
> [509742.784084] BTRFS: error (device sdb1) in cleanup_transaction:1881: errno=-5 IO failure
> [509742.784084] BTRFS info (device sdb1): delayed_refs has NO entry
> [512066.481724] usb 4-1: USB disconnect, device number 2
> [512066.484604] sd 6:0:0:0: [sdb] Synchronizing SCSI cache
> [512066.661516] usb 2-1: USB disconnect, device number 2
> [512066.661725] usblp0: removed

There are multiple hardware related errors here. Timing wise, it looks
like some kind of USB bus related error "xhci_hcd 0000:00:1d.6:
Mismatch between..." and maybe in the course of that error is what
causes the drive, /dev/sdb, to think a command it received is invalid,
hence "sd 6:0:0:0: [sdb] tag#0 Add. Sense: Invalid command operation
code" and the subsequent sector error, and the subsequent Btrfs error.

So Btrfs is getting confused because lower layers it has no control
over are doing the wrong thing; resulting in either dropped reads or
writes. Dropped writes can cause permanent corruption, depending on
what the writes are.

And then it looks like a USB device is disconnected.

Is /dev/sdb a USB stick or  drive in a USB enclosure? That's not
working out well, whatever it is.


> [512066.716156] usb 1-1: USB disconnect, device number 2
> [512066.751428] sd 6:0:0:0: [sdb] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
> [512066.811207] usb 2-2: USB disconnect, device number 3
> [512066.811350] usblp1: removed
> [512066.963131] usb 2-3: USB disconnect, device number 4
> [512066.963296] usblp2: removed
> [512067.091114] usb 4-1: new SuperSpeed USB device number 3 using xhci_hcd
> [512067.107446] usb 1-1: new full-speed USB device number 3 using uhci_hcd
> [512067.113881] usb 2-4: USB disconnect, device number 5
> [512067.114143] usblp3: removed
> [512067.126193] usb 4-1: New USB device found, idVendor=2109, idProduct=0711
> [512067.126195] usb 4-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
> [512067.126196] usb 4-1: Product: 20231
> [512067.126196] usb 4-1: Manufacturer: Ugreen
> [512067.126197] usb 4-1: SerialNumber: 000000128DA4
> [512067.140176] scsi host9: uas
> [512067.141472] scsi 9:0:0:0: Direct-Access     ST6000DM 004-2EH11C       DN02 PQ: 0 ANSI: 6
> [512067.142615] sd 9:0:0:0:


The problem with snippets is that they're over too small of a time
period, and after problems already started, to determine the sequence
of what went wrong, and what else is involved in the confusion. The
problem with getting the entire dmesg is that it's tedious to read
through to try and find the problem.

The gist though is the setup is flawed and now you've had three
failures it's telling you something is wrong.

Any hardware errors inside a VM guest, can be either hardware or
software errors in the host. It could be the hypervisor. It could be
the host kernel. Have you checked the host kernel messages while these
errors are happening inside the VM? What about the hypervisor logs?





-- 
Chris Murphy



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux