To add... I also tried nodatasum (only) and nodatacow otions. I found somewhere that nodatacow doesn't really mean tthat COW is disabled. Test data is still the same - CPU spikes and times are the same. On Fri, Feb 24, 2012 at 2:38 PM, Nik Markovic <nmarkovi.navteq@xxxxxxxxx> wrote: > On Fri, Feb 24, 2012 at 12:38 AM, Duncan <1i5t5.duncan@xxxxxxx> wrote: >> Nik Markovic posted on Thu, 23 Feb 2012 20:31:02 -0600 as excerpted: >> >>> I noticed a few errors in the script that I used. I corrected it and it >>> seems that degradation is occurring even at fully random writes: >> >> I don't have an ssd, but is it possible that you're simply seeing erase- >> block related degradation due to multi-write-block sized erase-blocks? >> >> It seems to me that when originally written to the btrfs-on-ssd, the file >> will likely be written block-sequentially enough that the file as a whole >> takes up relatively few erase-blocks. As you COW-write individual >> blocks, they'll be written elsewhere, perhaps all the changed blocks to a >> new erase-block, perhaps each to a different erase block. > > This is a very interesting insight. I wasn't even aware of the > erase-block issue, so I did some reading up on it... > >> >> As you increase the successive COW generation count, the file's file- >> system/write blocks will be spread thru more and more erase-blocks, >> basically fragmentation but of the SSD-critical type, into more and more >> erase blocks, thus affecting modification and removal time but not read >> time. > > OK, so time to write would increase due to fragmentation and writing, > it now makes sense (though I don't see why small writes would affect > this, but my concerns are not writes anyway), but why would cp > --reflink time increase so much. Yes, new extents would be created, > but btrfs doesn't write into data blocks, does it? I figured its > metadata would be kept in one place. I figure the only thing BTRFS > would do on cp --reflink=always: > 1. Take a collection of extents owned by source. > 2. Make the new copy use the same collection of extents. > 3. Write the collection of extents to the "directory". > > Now this process seems to be CPU intensive. When I remove or make a > reflink copy, one core pikes up to 100%, which tells me that there's a > performance issue there, not an ssd issue. Also, only one CPU thread > is being used for this. I figured that I can improve this by some > setting. Maybe thread_pool mount option? Are there any updates in > later kernels that I should possibly pick up? > >> >> IIRC I saw a note about this on the wiki, in regard to the nodatacow >> mount-option. Let's see if I can find it again. Hmm... yes... >> >> http://btrfs.ipv5.de/index.php?title=Getting_started#Mount_Options >> >> In particular this (for nodatacow, read the rest as there's additional >> implications): >> >>>>>>> >> Performance gain is usually < 5% unless the workload is random writes to >> large database files, where the difference can become very large. >> <<<<< >> > > Unless I am wrong, this would disable COW completely and reflink copy. > Reflinks are a crucial component and the sole > reason I picked BTRFS for the system that I am writing for my company. > The autodefrag option addresses multiple writes. Writing is not the > problem, but cp --reflink should be near-instant. That was the reason > we chose BTRFS over ZFS, which seemed to be the only feasible > alternative. ZFS snapshot complicate the design and deduplication copy > time is the same as (or not much better than) raw copy. > >> In addition to nodatacow, see the note on the autodefrag option. >> >> IOW, with the repeated generations of random-writes to cow-copies, you're >> apparently triggering a cow-worst-case fragmentation situation. It >> shouldn't affect read-time much on SSD, but it certainly will affect copy >> and erase time, as the data and metadata (which as you'll recall is 2X by >> default on btrfs) gets written to more and more blocks that need updated >> at copy/erase time, >> >> >> That /might/ be the problem triggering the freezes you noted that set off >> the original investigation as well, if the SSD firmware is running out of >> erase blocks and having to pause access while it rearranges data to allow >> operations to continue. Since your original issue on "rotating rust" >> drives was fragmentation, rewriting would seem to be something you do >> quite a lot of, triggering different but similar-cause issues on SSDs as >> well. >> >> FWIW, with that sort of database-style workload, large files constantly >> random-change rewritten, something like xfs might be more appropriate >> than btrfs. See the recent xfs presentations (were they at ScaleX or >> LinuxConf.au? both happened about the same time and were covered in the >> same LWN weekly edition) as covered a couple weeks ago on LWN for more. >> > > As I mentioned above, the COW is the crucial component of our system, > XFS won't do. Our system does not do random writes. In fact it is > mainly heavy on read operation. The system does occasional "rotation > of rust" on large files in a way that version control system would > (large files are modified and then used as a new baseline) > >> -- >> Duncan - List replies preferred. No HTML msgs. >> "Every nonfree program has a lord, a master -- >> and if you use the program, he is your master." Richard Stallman >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > Thanks for all your help on this issue. I hope that someone can point > out some more tweaks or added features/fixes after 3.2 RC5 that I may > do. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
