Re: Btrfs reserve metadata problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2018年01月03日 09:55, robbieko wrote:
> Hi Qu,
> 
> Do you have a patch to reduce meta rsv ?

Not exactly, only for qgroup.

[PATCH v2 10/10] btrfs: qgroup: Use independent and accurate per inode
qgroup rsv

But that could provide enough clue to implement a smaller meta rsv.

My current safe guess would be "(BTRFS_MAX_TREE_LEVEL + 2) * nodesize"
for each outstanding extent, and further step it with the number of
outstanding extents.
(Not always increase/decrease the meta rsv if outstanding extents
changes, but only increase/decrease if oustanding extents exceeds
certain amount)

Thanks,
Qu

> 
> 
> Hi Peter Grandi,
> 
> 1. all files have been initialized with dd, No need to change any metadata.
> 2. my test with Total volume size 190G, used 128G, available 60G, but
> only 60 MB dirty pages.
>     According to the meta rsv rules, 1GB free space up to only 1MB dirty
> pages
> 3. cow enabled is also the same problem
> 
> It is a serious performance issue.
> 
> Thanks.
> robbieko
> 
> 
> pg@xxxxxxxxxxxxxxxxxxxxx 於 2018-01-02 21:08 寫到:
>>> When testing Btrfs with fio 4k random write,
>>
>> That's an exceptionally narrowly defined workload. Also it is
>> narrower than that, because it must be without 'fsync' after
>> each write, or else there would be no accumulation of dirty
>> blocks in memory at all.
>>
>>> I found that volume with smaller free space available has
>>> lower performance.
>>
>> That's an inappropriate use of "performance"... The speed may be
>> lower, the performance is another matter.
>>
>>> It seems that the smaller the free space of volume is, the
>>> smaller amount of dirty page filesystem could have.
>>
>> Is this a problem? Consider: all filesystems do less well when
>> there is less free space (smaller chance of finding spatially
>> compact allocations), it is usually good to minimize the the
>> amont of dirty pages anyhow (even if there are reasons to keep
>> delay writing them out).
>>
>>> [ ... ] btrfs will reserve metadata for every write.  The
>>> amount to reserve is calculated as follows: nodesize *
>>> BTRFS_MAX_LEVEL(8) * 2, i.e., it reserves 256KB of metadata.
>>> The maximum amount of metadata reservation depends on size of
>>> metadata currently in used and free space within volume(free
>>> chunk size /16) When metadata reaches the limit, btrfs will
>>> need to flush the data to release the reservation.
>>
>> I don't understand here: under POSIX semantics filesystems are
>> not really allowed to avoid flushing *metadata* to disk for most
>> operations, that is metadata operations have an implied 'fsync'.
>> Your case of the "4k random write" with "cow disabled" the only
>> metadata that should get updated is the last-modified timestamp,
>> unless the user/application has been so amazingly stupid to not
>> preallocate the file, and then they deserve whatever they get.
>>
>>> 1. Is there any logic behind the value (free chunk size /16)
>>
>>>   /*
>>>    * If we have dup, raid1 or raid10 then only half of the free
>>>    * space is actually useable. For raid56, the space info used
>>>    * doesn't include the parity drive, so we don't have to
>>>    * change the math
>>>    */
>>>   if (profile & (BTRFS_BLOCK_GROUP_DUP |
>>>           BTRFS_BLOCK_GROUP_RAID1 |
>>>           BTRFS_BLOCK_GROUP_RAID10))
>>>    avail >>= 1;
>>
>> As written there is a plausible logic, but it is quite crude.
>>
>>>   /*
>>>    * If we aren't flushing all things, let us overcommit up to
>>>    * 1/2th of the space. If we can flush, don't let us overcommit
>>>    * too much, let it overcommit up to 1/8 of the space.
>>>    */
>>>   if (flush == BTRFS_RESERVE_FLUSH_ALL)
>>>    avail >>= 3;
>>>   else
>>>    avail >>= 1;
>>
>> Presumably overcommitting beings some benefits on other workloads.
>>
>> In particular other parts of Btrfs don't behave awesomely well
>> when free space runs out.
>>
>>> 2. Is there any way to improve this problem?
>>
>> Again, is it a problem? More interestingly, if it is a problem
>> is a solution available that does not impact other workloads?
>> It is simply impossible to optimize a filesystem perfectly for
>> every workload.
>>
>> I'll try to summarize your report as I understand it:
>>
>> * If:
>>   - The workload is "4k random write" (without 'fsync').
>>   - On a "cow disabled" file.
>>   - The file is not preallocated.
>>   - There is not much free space available.
>> * Then allocation overcommitting results in a higher frequency
>>   of unrequested metadata flushes, and those metadata flushes
>>   slow down a specific benchmark.
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux