Re: Strange prformance degradation when COW writes happen at fixed offsets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



To add... I also tried nodatasum (only) and nodatacow otions. I found
somewhere that nodatacow doesn't really mean tthat COW is disabled.
Test data is still the same - CPU spikes and times are the same.

On Fri, Feb 24, 2012 at 2:38 PM, Nik Markovic <nmarkovi.navteq@xxxxxxxxx> wrote:
> On Fri, Feb 24, 2012 at 12:38 AM, Duncan <1i5t5.duncan@xxxxxxx> wrote:
>> Nik Markovic posted on Thu, 23 Feb 2012 20:31:02 -0600 as excerpted:
>>
>>> I noticed a few errors in the script that I used. I corrected it and it
>>> seems that degradation is occurring even at fully random writes:
>>
>> I don't have an ssd, but is it possible that you're simply seeing erase-
>> block related degradation due to multi-write-block sized erase-blocks?
>>
>> It seems to me that when originally written to the btrfs-on-ssd, the file
>> will likely be written block-sequentially enough that the file as a whole
>> takes up relatively few erase-blocks.  As you COW-write individual
>> blocks, they'll be written elsewhere, perhaps all the changed blocks to a
>> new erase-block, perhaps each to a different erase block.
>
> This is a very interesting insight. I wasn't even aware of the
> erase-block issue, so I did some reading up on it...
>
>>
>> As you increase the successive COW generation count, the file's file-
>> system/write blocks will be spread thru more and more erase-blocks,
>> basically fragmentation but of the SSD-critical type, into more and more
>> erase blocks, thus affecting modification and removal time but not read
>> time.
>
> OK, so time to write would increase due to fragmentation and writing,
> it now makes sense (though I don't see why small writes would affect
> this, but my concerns are not writes anyway), but why would cp
> --reflink time increase so much. Yes, new extents would be created,
> but btrfs doesn't write into data blocks, does it? I figured its
> metadata would be kept in one place. I figure the only thing BTRFS
> would do on cp --reflink=always:
> 1. Take a collection of extents owned by source.
> 2. Make the new copy use the same collection of extents.
> 3. Write the collection of extents to the "directory".
>
> Now this process seems to be CPU intensive. When I remove or make a
> reflink copy, one core pikes up to 100%, which tells me that there's a
> performance issue there, not an ssd issue. Also, only one CPU thread
> is being used for this. I figured that I can improve this by some
> setting. Maybe thread_pool mount option? Are there any updates in
> later kernels that I should possibly pick up?
>
>>
>> IIRC I saw a note about this on the wiki, in regard to the nodatacow
>> mount-option.  Let's see if I can find it again.  Hmm... yes...
>>
>> http://btrfs.ipv5.de/index.php?title=Getting_started#Mount_Options
>>
>> In particular this (for nodatacow, read the rest as there's additional
>> implications):
>>
>>>>>>>
>> Performance gain is usually < 5% unless the workload is random writes to
>> large database files, where the difference can become very large.
>> <<<<<
>>
>
> Unless I am wrong, this would disable COW completely and reflink copy.
> Reflinks are a crucial component and the sole
> reason I picked BTRFS for the system that I am writing for my company.
> The autodefrag option addresses multiple writes. Writing is not the
> problem, but cp --reflink should be near-instant. That was the reason
> we chose BTRFS over ZFS, which seemed to be the only feasible
> alternative. ZFS snapshot complicate the design and deduplication copy
> time is the same as (or not much better than) raw copy.
>
>> In addition to nodatacow, see the note on the autodefrag option.
>>
>> IOW, with the repeated generations of random-writes to cow-copies, you're
>> apparently triggering a cow-worst-case fragmentation situation.  It
>> shouldn't affect read-time much on SSD, but it certainly will affect copy
>> and erase time, as the data and metadata (which as you'll recall is 2X by
>> default on btrfs) gets written to more and more blocks that need updated
>> at copy/erase time,
>>
>>
>> That /might/ be the problem triggering the freezes you noted that set off
>> the original investigation as well, if the SSD firmware is running out of
>> erase blocks and having to pause access while it rearranges data to allow
>> operations to continue.  Since your original issue on "rotating rust"
>> drives was fragmentation, rewriting would seem to be something you do
>> quite a lot of, triggering different but similar-cause issues on SSDs as
>> well.
>>
>> FWIW, with that sort of database-style workload, large files constantly
>> random-change rewritten, something like xfs might be more appropriate
>> than btrfs.  See the recent xfs presentations (were they at ScaleX or
>> LinuxConf.au? both happened about the same time and were covered in the
>> same LWN weekly edition) as covered a couple weeks ago on LWN for more.
>>
>
> As I mentioned above, the COW is the crucial component of our system,
> XFS won't do. Our system does not do random writes. In fact it is
> mainly heavy on read operation. The system does occasional "rotation
> of rust" on large files in a way that version control system would
> (large files are modified and then used as a new baseline)
>
>> --
>> Duncan - List replies preferred.   No HTML msgs.
>> "Every nonfree program has a lord, a master --
>> and if you use the program, he is your master."  Richard Stallman
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> Thanks for all your help on this issue. I hope that someone can point
> out some more tweaks or added features/fixes after 3.2 RC5 that I may
> do.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux