On Mon, Jun 10, 2019 at 6:03 PM Eric Mesa <eric@xxxxxxxxxxxx> wrote: > When I did the btrfs send / receive for SSDHome it took 2.5 days to send > ~500GiB over a 1GBps cable to the 3GBps drive in the server. It also had the > error: > ERROR: failed to clone extents to ermesa/.cache/krunner/ > bookmarkrunnerfirefoxfavdbfile.sqlite: Invalid argument While there are distinct send and receive errors possible, I'm not familiar with recognizing them. You can get a better idea what the problem is with -vv or -vvv to get a more verbose error on the side that's having the problem. My guess is this is a send error message. > > Let's say that snapshot A is a snapshot sent to the server without -p. It > sends the entire 500GB for 18 hours. > > Then I do snapshot B. I send it with -p - takes 15 minutes or so depending on > how much data I've added. > > Then I do snapshot C - and here I always get an error. It's most useful if you show exact commands because actually it's not always obvious to everyone what the logic should be and the error handling doesn't always stop a user from doing something that doesn't make a lot of sense. We need to know the name of the rw subvolume; the command to snapshot it; the full send/receive command for that first snapshot; the command for a subsequent snapshot; and the command to incrementally send/receive it. > > And it always is something like: > > ERROR: link ermesa/.mozilla/firefox/n35gu0fb.default/bookmarkbackups/ > bookmarks-2019-06-09_679_I1bs5PtgsPwtyXvcvcRdSg==.jsonlz4 -> ermesa/.mozilla/ > firefox/n35gu0fb.default/bookmarkbackups/ > bookmarks-2019-06-08_679_I1bs5PtgsPwtyXvcvcRdSg==.jsonlz4 failed: No such file > or directory > > It always involves either .cache or .mozilla - the types of files that are > constantly changing. > > It doesn't matter if I do a defrag before snapshot C followed by the sync > command. It seems that for SSDHome I can only do one full snap send and then > one parent send. I don't actually know the status of snapshot aware defragmentation. It wasn't there, then it was there, then there were problems, and I think it was pulled rather than fixed. But I don't remember really. I also don't know if there's a difference between manual defragging and autodefrag, because I don't use either one. I do use reflinks. And I have done deduplication. And I don't have any send/receive failures. I do sometimes see slow sections of send/receive. > > Again, so far it seems to be working fine with the other drives which seems to > suggest to me that it's maybe not the version of my kernel or btrfs progs or > anything else. Do you remember the mkfs command for this file system? Or also helpful would be: # btrfs insp dump-s -f /dev/X ## for both send and receive side file system (only one device from each Btrfs volume is needed), this will give us an idea what the mkfs options were including feature flags. > And dmesg.log is attached [ 6.949347] BTRFS info (device sdb1): enabling auto defrag Could be related. And then also [ 9.906695] usb-storage 8-1.3:1.0: USB Mass Storage device detected [ 9.907006] scsi host7: usb-storage 8-1.3:1.0 [ 10.950446] scsi 7:0:0:0: Direct-Access B&N NOOK 0322 PQ: 0 ANSI: 2 [ 10.951110] sd 7:0:0:0: Attached scsi generic sg7 type 0 [ 10.951161] sd 7:0:0:0: Power-on or device reset occurred [ 10.952880] sd 7:0:0:0: [sdg] Attached SCSI removable disk snip [ 267.794434] usb 9-1.1: reset SuperSpeed Gen 1 USB device number 3 using xhci_hcd [ 272.838054] usb 9-1.1: device descriptor read/8, error -110 [ 272.941832] usb 9-1.1: reset SuperSpeed Gen 1 USB device number 3 using xhci_hcd [ 277.958049] usb 9-1.1: device descriptor read/8, error -110 [ 278.236339] usb 9-1.1: reset SuperSpeed Gen 1 USB device number 3 using xhci_hcd USB enclosed drives can be troublesome with any file system, expressly because of these seemingly random reset that happen. I had the same thing in an early form of my setup, and it did cause me problems that Btrfs worked around. But I considered it untenable and fixed it with a good quality self-powered USB hub (don't rely on bus power), or perhaps more specifically one that comes with a high amp power adapter. It needs to be able to drive all the drives in their read/write usage, which for laptop drives is ~0.35A each. You really shouldn't be getting link resets like the above, even though I suspect it's unrelated to the current problem report. -- Chris Murphy
