Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I noticed that every time Data gets bumped, it only gets bumped by a
couple GB. I rarely ever store files on that disk that are larger than
2 GB, But the last time it crashed, I was moving a file that was 4.3
GB, so maybe that's conductive to the crash happening? Maybe the file
being larger than what btrfs would allocate has something to do with
this. I will keep track of the amount of data since last crash, and
the file size when the crash occured.

On Mon, Jan 11, 2016 at 2:45 PM, cheater00 . <cheater00@xxxxxxxxx> wrote:
> After remounting, the bug doesn't transpire any more, Data gets resized.
>
> It is my experience that this bug will go untriggered for weeks at a
> time until I write a lot to that disk there, at which point it'll
> happen very quickly. I believe this has more to do with the amount of
> data that's been written to disk than anything else. It has been about
> 48 GB to trigger the last instance and I don't think that's very
> different from what happened before but I didn't keep track exactly.
>
> On Mon, Jan 11, 2016 at 2:30 PM, cheater00 . <cheater00@xxxxxxxxx> wrote:
>> The bug just happened again. Attached is a log since the time I
>> mounted the FS right after the fsck.
>>
>> Note the only things between the message I got while mounting:
>> [216798.144518] BTRFS info (device sdc1): disk space caching is enabled
>>
>> and the beginning of the crash dump:
>> [241534.760651] ------------[ cut here ]------------
>>
>> is this:
>> [218266.098344] usb 4-1.1: reset high-speed USB device number 3 using ehci-pci
>> [233647.332085] usb 4-1.1: reset high-speed USB device number 3 using ehci-pci
>>
>> I am not sure why those resets happen, though. I bought a few cables
>> and experimented with them, and the usb ports themselves are located
>> directly on the motherboard.
>> Also, they happened some considerable time before the crash dump. So
>> I'm not sure they're even related. Especially given that I was copying
>> a lot of very small files, and they all copied onto the disk fine all
>> the time between the last usb reset and the crash dump, which is
>> roughly two and a half hours. In fact I pressed ctrl-z on a move
>> operation and then wrote something like sleep $(echo '60*60*3' | bc) ;
>> fg and ran it just past 9 am, so the mv resumed past 12 pm, so as
>> things add up the last usb reset happened even before the mv was
>> resumed with fg.
>>
>> I unmounted the fs and re-mounted the it to make it writeable again.
>> This showed up in dmesg:
>>
>> [241766.485365] BTRFS error (device sdc1): cleaner transaction attach
>> returned -30
>> [241770.115897] BTRFS info (device sdc1): disk space caching is enabled
>>
>> this time there was no "info" line about the free space cache file. So
>> maybe it wasn't important for the bug to occur at all.
>>
>> The new output of btrfs fi df -g is:
>> Data, single: total=2080.01GiB, used=2078.80GiB
>> System, DUP: total=0.01GiB, used=0.00GiB
>> System, single: total=0.00GiB, used=0.00GiB
>> Metadata, DUP: total=5.50GiB, used=3.73GiB
>> Metadata, single: total=0.01GiB, used=0.00GiB
>> GlobalReserve, single: total=0.50GiB, used=0.00GiB
>>
>> I could swap this disk onto sata and the other disk back onto usb to
>> see if the usb resets have anything to do with this. But I'm skeptic.
>> Also maybe btrfs has some other issues related to just the disk being
>> on usb, resets or not, and this way if the bug doesn't trigger on sata
>> we'll think "aha it was the resets, buggy hardware etc" but instead
>> it'll have been something else that just has to do with the disk being
>> on usb operating normally.
>>
>> On Mon, Jan 11, 2016 at 2:11 PM, cheater00 . <cheater00@xxxxxxxxx> wrote:
>>> On Mon, Jan 11, 2016 at 2:05 PM, Austin S. Hemmelgarn
>>> <ahferroin7@xxxxxxxxx> wrote:
>>>> On 2016-01-09 16:07, cheater00 . wrote:
>>>>>
>>>>> Would like to point out that this can cause data loss. If I'm writing
>>>>> to disk and the disk becomes unexpectedly read only - that data will
>>>>> be lost, because who in their right mind makes their code expect this
>>>>> and builds a contingency (e.g. caching, backpressure, etc)...
>>>>
>>>> If a data critical application (mail server, database server, anything
>>>> similar) can't gracefully handle ENOSPC, then that application is broken,
>>>> not the FS.  As an example, set up a small VM with an SMTP server, then
>>>> force the FS the server uses for queuing mail read-only, and see if you can
>>>> submit mail, then go read the RFCs for SMTP and see what clients are
>>>> supposed to do when they can't submit mail.  A properly designed piece of
>>>> software is supposed to be resilient against common failure modes of the
>>>> resources it depends on (which includes ENOSPC and read-only filesystems for
>>>> anything that works with data on disk).
>>>>>
>>>>>
>>>>> There's no loss of data on the disk because the data doesn't make it
>>>>> to disk in the first place. But it's exactly the same as if the data
>>>>> had been written to disk, and then lost.
>>>>>
>>>> No, it isn't.  If you absolutely need the data on disk, you should be
>>>> calling fsync or fdatasync, and then assuming if those return an error that
>>>> none of the data written since the last call has gotten to the disk (some of
>>>> it might have, but you need to assume it hasn't).  Every piece of software
>>>> in wide usage that requires data to be on the disk does this, because
>>>> otherwise it can't guarantee that the data is on disk.
>>>
>>> I agree that a lot of stuff goes right in a perfect world. But most of
>>> the time what you're running isn't a mail server used by billions of
>>> users, but instead a bash script someone wrote once that's supposed to
>>> do something, and no one knows how it works.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux