I noticed that every time Data gets bumped, it only gets bumped by a couple GB. I rarely ever store files on that disk that are larger than 2 GB, But the last time it crashed, I was moving a file that was 4.3 GB, so maybe that's conductive to the crash happening? Maybe the file being larger than what btrfs would allocate has something to do with this. I will keep track of the amount of data since last crash, and the file size when the crash occured. On Mon, Jan 11, 2016 at 2:45 PM, cheater00 . <cheater00@xxxxxxxxx> wrote: > After remounting, the bug doesn't transpire any more, Data gets resized. > > It is my experience that this bug will go untriggered for weeks at a > time until I write a lot to that disk there, at which point it'll > happen very quickly. I believe this has more to do with the amount of > data that's been written to disk than anything else. It has been about > 48 GB to trigger the last instance and I don't think that's very > different from what happened before but I didn't keep track exactly. > > On Mon, Jan 11, 2016 at 2:30 PM, cheater00 . <cheater00@xxxxxxxxx> wrote: >> The bug just happened again. Attached is a log since the time I >> mounted the FS right after the fsck. >> >> Note the only things between the message I got while mounting: >> [216798.144518] BTRFS info (device sdc1): disk space caching is enabled >> >> and the beginning of the crash dump: >> [241534.760651] ------------[ cut here ]------------ >> >> is this: >> [218266.098344] usb 4-1.1: reset high-speed USB device number 3 using ehci-pci >> [233647.332085] usb 4-1.1: reset high-speed USB device number 3 using ehci-pci >> >> I am not sure why those resets happen, though. I bought a few cables >> and experimented with them, and the usb ports themselves are located >> directly on the motherboard. >> Also, they happened some considerable time before the crash dump. So >> I'm not sure they're even related. Especially given that I was copying >> a lot of very small files, and they all copied onto the disk fine all >> the time between the last usb reset and the crash dump, which is >> roughly two and a half hours. In fact I pressed ctrl-z on a move >> operation and then wrote something like sleep $(echo '60*60*3' | bc) ; >> fg and ran it just past 9 am, so the mv resumed past 12 pm, so as >> things add up the last usb reset happened even before the mv was >> resumed with fg. >> >> I unmounted the fs and re-mounted the it to make it writeable again. >> This showed up in dmesg: >> >> [241766.485365] BTRFS error (device sdc1): cleaner transaction attach >> returned -30 >> [241770.115897] BTRFS info (device sdc1): disk space caching is enabled >> >> this time there was no "info" line about the free space cache file. So >> maybe it wasn't important for the bug to occur at all. >> >> The new output of btrfs fi df -g is: >> Data, single: total=2080.01GiB, used=2078.80GiB >> System, DUP: total=0.01GiB, used=0.00GiB >> System, single: total=0.00GiB, used=0.00GiB >> Metadata, DUP: total=5.50GiB, used=3.73GiB >> Metadata, single: total=0.01GiB, used=0.00GiB >> GlobalReserve, single: total=0.50GiB, used=0.00GiB >> >> I could swap this disk onto sata and the other disk back onto usb to >> see if the usb resets have anything to do with this. But I'm skeptic. >> Also maybe btrfs has some other issues related to just the disk being >> on usb, resets or not, and this way if the bug doesn't trigger on sata >> we'll think "aha it was the resets, buggy hardware etc" but instead >> it'll have been something else that just has to do with the disk being >> on usb operating normally. >> >> On Mon, Jan 11, 2016 at 2:11 PM, cheater00 . <cheater00@xxxxxxxxx> wrote: >>> On Mon, Jan 11, 2016 at 2:05 PM, Austin S. Hemmelgarn >>> <ahferroin7@xxxxxxxxx> wrote: >>>> On 2016-01-09 16:07, cheater00 . wrote: >>>>> >>>>> Would like to point out that this can cause data loss. If I'm writing >>>>> to disk and the disk becomes unexpectedly read only - that data will >>>>> be lost, because who in their right mind makes their code expect this >>>>> and builds a contingency (e.g. caching, backpressure, etc)... >>>> >>>> If a data critical application (mail server, database server, anything >>>> similar) can't gracefully handle ENOSPC, then that application is broken, >>>> not the FS. As an example, set up a small VM with an SMTP server, then >>>> force the FS the server uses for queuing mail read-only, and see if you can >>>> submit mail, then go read the RFCs for SMTP and see what clients are >>>> supposed to do when they can't submit mail. A properly designed piece of >>>> software is supposed to be resilient against common failure modes of the >>>> resources it depends on (which includes ENOSPC and read-only filesystems for >>>> anything that works with data on disk). >>>>> >>>>> >>>>> There's no loss of data on the disk because the data doesn't make it >>>>> to disk in the first place. But it's exactly the same as if the data >>>>> had been written to disk, and then lost. >>>>> >>>> No, it isn't. If you absolutely need the data on disk, you should be >>>> calling fsync or fdatasync, and then assuming if those return an error that >>>> none of the data written since the last call has gotten to the disk (some of >>>> it might have, but you need to assume it hasn't). Every piece of software >>>> in wide usage that requires data to be on the disk does this, because >>>> otherwise it can't guarantee that the data is on disk. >>> >>> I agree that a lot of stuff goes right in a perfect world. But most of >>> the time what you're running isn't a mail server used by billions of >>> users, but instead a bash script someone wrote once that's supposed to >>> do something, and no one knows how it works. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
