Re: Two persistent problems

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Josef Bacik wrote on 14.11.2014 at 23:00:
> On 11/14/2014 04:51 PM, Hugo Mills wrote:
>>     Chris, Josef, anyone else who's interested,
>>
>>     On IRC, I've been seeing reports of two persistent unsolved
>> problems. Neither is showing up very often, but both have turned up
>> often enough to indicate that there's something specific going on
>> worthy of investigation.
>>
>>     One of them is definitely a btrfs problem. The other may be btrfs,
>> or something in the block layer, or just broken hardware; it's hard to
>> tell from where I sit.
>>
>> Problem 1: ENOSPC on balance
>>
>>     This has been going on since about March this year. I can
>> reasonably certainly recall 8-10 cases, possibly a number more. When
>> running a balance, the operation fails with ENOSPC when there's plenty
>> of space remaining unallocated. This happens on full balance, filtered
>> balance, and device delete. Other than the ENOSPC on balance, the FS
>> seems to work OK. It seems to be more prevalent on filesystems
>> converted from ext*. The first few or more reports of this didn't make
>> it to bugzilla, but a few of them since then have gone in.
>>
>> Problem 2: Unexplained zeroes
>>
>>     Failure to mount. Transid failure, "expected xyz, have 0". Chris
>> looked at an early one of these (for Ke, on IRC) back in September
>> (the 27th -- sadly, the public IRC logs aren't there for it, but I can
>> supply a copy of the private log). He rapidly came to the conclusion
>> that it was something bad going on with TRIM, replacing some blocks
>> with zeroes. Since then, I've seen a bunch of these coming past on
>> IRC. It seems to be a 3.17 thing. I can successfully predict the
>> presence of an SSD and -odiscard from the "have 0". I've successfully
>> persuaded several people to put this into bugzilla and capture
>> btrfs-images.  btrfs recover doesn't generally seem to be helpful in
>> recovering data.
>>
>>
>>     I think Josef had problem 1 in his sights, but I don't know if
>> additional images or reports are helpful at this point. For problem 2,
>> there's obviously something bad going on, but there's not much else to
>> go on -- and the inability to recover data isn't good.
>>
>>     For each of these, what more information should I be trying to
>> collect from any future reporters?
>>
>>
>
> So for #2 I've been looking at that the last two weeks.  I'm always
> paranoid we're screwing up one of our data integrity sort of things,
> either not waiting on IO to complete properly or something like that.
> I've built a dm target to be as evil as possible and have been running
> it trying to make bad things happen.  I got slightly side tracked
> since my stress test exposed a bug in the tree log stuff an csums
> which I just fixed.  Now that I've fixed that I'm going back to try
> and make the "expected blah, have 0" type errors happen.
>
> As for the ENOSPC I keep meaning to look into it and I keep getting
> distracted with other more horrible things.  Ideally I'd like to
> reproduce it myself, so more info on that front would be good, like do
> all reports use RAID/compression/some other odd set of features? 
> Thanks for taking care of this stuff Hugo, #2 is the worst one and I'd
> like to be absolutely sure it's not our bug, once I'm happy we aren't
> I'll look at the balance thing.
>
> Josef

For #2, I had a strangely damaged BTRFS I reported a week or so ago
which may have similar background. Dmesg gives:

parent transid verify failed on 586239082496 wanted 13329746340512024838
found 588
BTRFS: open_ctree failed

The thing is that btrfsck crashes when trying to check this. As nobody
seemed to be interested I reformatted this disk today.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux