Re: 6TB partition, Data only 2TB - aka When you haven't hit the "usual" problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



yeah it's -o enospc_debug. I forgot to enable it this time. I'll
enable it and see where that goes. I'll put it in fstab.

On Tue, Jan 12, 2016 at 12:07 AM, Hugo Mills <hugo@xxxxxxxxxxxxx> wrote:
> On Mon, Jan 11, 2016 at 03:39:43PM -0700, Chris Murphy wrote:
>> On Mon, Jan 11, 2016 at 3:30 PM, Hugo Mills <hugo@xxxxxxxxxxxxx> wrote:
>> > On Mon, Jan 11, 2016 at 03:20:36PM -0700, Chris Murphy wrote:
>> >> On Mon, Jan 11, 2016 at 3:10 PM, Hugo Mills <hugo@xxxxxxxxxxxxx> wrote:
>> >> > On Mon, Jan 11, 2016 at 02:31:41PM -0700, Chris Murphy wrote:
>> >> >> On Mon, Jan 11, 2016 at 2:03 AM, Hugo Mills <hugo@xxxxxxxxxxxxx> wrote:
>> >> >> > On Sun, Jan 10, 2016 at 05:13:28PM -0700, Chris Murphy wrote:
>> >> >> >> On Sat, Jan 9, 2016 at 2:04 PM, Hugo Mills <hugo@xxxxxxxxxxxxx> wrote:
>> >> >> >> > On Sat, Jan 09, 2016 at 09:59:29PM +0100, cheater00 . wrote:
>> >> >> >> >> OK. How do we track down that bug and get it fixed?
>> >> >> >> >
>> >> >> >> >    I have no idea. I'm not a btrfs dev, I'm afraid.
>> >> >> >> >
>> >> >> >> >    It's been around for a number of years. None of the devs has, I
>> >> >> >> > think, had the time to look at it. When Josef was still (publicly)
>> >> >> >> > active, he had it second on his list of bugs to look at for many
>> >> >> >> > months -- but it always got trumped by some new bug that could cause
>> >> >> >> > data loss.
>> >> >> >>
>> >> >> >>
>> >> >> >> Interesting. I did not know of this bug. It's pretty rare.
>> >> >> >
>> >> >> >    Not really. It shows up maybe on average once a week on IRC. It
>> >> >> > gets reported much less on the mailing list.
>> >> >>
>> >> >> Is there a pattern? Does it only happen at a 2TiB threshold?
>> >> >
>> >> >    No, and no.
>> >> >
>> >> >    There is, as far as I can tell from some years of seeing reports of
>> >> > this bug, no correlation with RAID level, hardware, OS, kernel
>> >> > version, FS size, usage of the FS at failure, or allocation level of
>> >> > either data or metadata at failure.
>> >> >
>> >> >    I haven't tried correlating with the phase of the moon or the
>> >> > losses on Lloyds Register yet.
>> >>
>> >> Huh. So it's goofy cakes.
>> >>
>> >> This is specifically where btrfs_free_extent produces errno -28 no
>> >> space left, and then the fs goes read-only?
>> >
>> >    The symptoms I'm using for a diagnosis of this bug are that the FS
>> > runs out of (usually data) space when there's still unallocated space
>> > remaining that it could use for another block group.
>> >
>> >    Forced RO isn't usually a symptom, although the FS can get into a
>> > state where you can't modify it (as distinct from being explicitly
>> > read-only).
>> >
>> >    Block-group level operations, like balance, device delete, device
>> > add sometimes seem to have some kind of (usually small) effect on the
>> > point at which the error occurs. If you hit the problem and run a
>> > balance, you might end up making things worse by a couple of
>> > gigabytes, or making things better by the same amount, or having no
>> > effect at all.
>>
>> Are there any compile time options not normally set that would help find it?
>> # CONFIG_BTRFS_FS_CHECK_INTEGRITY is not set
>> # CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set
>> # CONFIG_BTRFS_DEBUG is not set
>> # CONFIG_BTRFS_ASSERT is not set
>>
>> Once it starts to happen, it sounds like it's straightforward to
>> reproduce in a short amount of time. I'm kinda surprised I've never
>> run into this.
>
>    It does sometimes have a repeating nature: I'm reasonably sure
> we've seen a few people get it repeatedly on different filesystems.
> This might point at a particular workload needed to trigger it. (Or
> just bad luck / statistical likelihood). Some people have never hit
> it.
>
>    There is (or at least, was) an ENOSPC debugging option. I think
> that's a mount option. That's probably the most useful one, but the
> range of usefulness of existing debug output may be very small. :)
>
>    (Sorry for the vague nature of this reply -- it's been a very long
> day).
>
>    Hugo.
>
> --
> Hugo Mills             | "What are we going to do tonight?"
> hugo@... carfax.org.uk | "The same thing we do every night, Pinky. Try to
> http://carfax.org.uk/  | take over the world!"
> PGP: E2AB1DE4          |
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux