On 2018年01月03日 09:55, robbieko wrote: > Hi Qu, > > Do you have a patch to reduce meta rsv ? Not exactly, only for qgroup. [PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode qgroup rsv But that could provide enough clue to implement a smaller meta rsv. My current safe guess would be "(BTRFS_MAX_TREE_LEVEL + 2) * nodesize" for each outstanding extent, and further step it with the number of outstanding extents. (Not always increase/decrease the meta rsv if outstanding extents changes, but only increase/decrease if oustanding extents exceeds certain amount) Thanks, Qu > > > Hi Peter Grandi, > > 1. all files have been initialized with dd, No need to change any metadata. > 2. my test with Total volume size 190G, used 128G, available 60G, but > only 60 MB dirty pages. > According to the meta rsv rules, 1GB free space up to only 1MB dirty > pages > 3. cow enabled is also the same problem > > It is a serious performance issue. > > Thanks. > robbieko > > > pg@xxxxxxxxxxxxxxxxxxxxx 於 2018-01-02 21:08 寫到: >>> When testing Btrfs with fio 4k random write, >> >> That's an exceptionally narrowly defined workload. Also it is >> narrower than that, because it must be without 'fsync' after >> each write, or else there would be no accumulation of dirty >> blocks in memory at all. >> >>> I found that volume with smaller free space available has >>> lower performance. >> >> That's an inappropriate use of "performance"... The speed may be >> lower, the performance is another matter. >> >>> It seems that the smaller the free space of volume is, the >>> smaller amount of dirty page filesystem could have. >> >> Is this a problem? Consider: all filesystems do less well when >> there is less free space (smaller chance of finding spatially >> compact allocations), it is usually good to minimize the the >> amont of dirty pages anyhow (even if there are reasons to keep >> delay writing them out). >> >>> [ ... ] btrfs will reserve metadata for every write. The >>> amount to reserve is calculated as follows: nodesize * >>> BTRFS_MAX_LEVEL(8) * 2, i.e., it reserves 256KB of metadata. >>> The maximum amount of metadata reservation depends on size of >>> metadata currently in used and free space within volume(free >>> chunk size /16) When metadata reaches the limit, btrfs will >>> need to flush the data to release the reservation. >> >> I don't understand here: under POSIX semantics filesystems are >> not really allowed to avoid flushing *metadata* to disk for most >> operations, that is metadata operations have an implied 'fsync'. >> Your case of the "4k random write" with "cow disabled" the only >> metadata that should get updated is the last-modified timestamp, >> unless the user/application has been so amazingly stupid to not >> preallocate the file, and then they deserve whatever they get. >> >>> 1. Is there any logic behind the value (free chunk size /16) >> >>> /* >>> * If we have dup, raid1 or raid10 then only half of the free >>> * space is actually useable. For raid56, the space info used >>> * doesn't include the parity drive, so we don't have to >>> * change the math >>> */ >>> if (profile & (BTRFS_BLOCK_GROUP_DUP | >>> BTRFS_BLOCK_GROUP_RAID1 | >>> BTRFS_BLOCK_GROUP_RAID10)) >>> avail >>= 1; >> >> As written there is a plausible logic, but it is quite crude. >> >>> /* >>> * If we aren't flushing all things, let us overcommit up to >>> * 1/2th of the space. If we can flush, don't let us overcommit >>> * too much, let it overcommit up to 1/8 of the space. >>> */ >>> if (flush == BTRFS_RESERVE_FLUSH_ALL) >>> avail >>= 3; >>> else >>> avail >>= 1; >> >> Presumably overcommitting beings some benefits on other workloads. >> >> In particular other parts of Btrfs don't behave awesomely well >> when free space runs out. >> >>> 2. Is there any way to improve this problem? >> >> Again, is it a problem? More interestingly, if it is a problem >> is a solution available that does not impact other workloads? >> It is simply impossible to optimize a filesystem perfectly for >> every workload. >> >> I'll try to summarize your report as I understand it: >> >> * If: >> - The workload is "4k random write" (without 'fsync'). >> - On a "cow disabled" file. >> - The file is not preallocated. >> - There is not much free space available. >> * Then allocation overcommitting results in a higher frequency >> of unrequested metadata flushes, and those metadata flushes >> slow down a specific benchmark. >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html
Attachment:
signature.asc
Description: OpenPGP digital signature
