On 3.05.19 г. 13:45 ч., Filipe Manana wrote: > On Fri, May 3, 2019 at 11:18 AM Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote: >> >> >> >> On 2019/5/3 下午5:21, Filipe Manana wrote: >>> On Fri, May 3, 2019 at 2:46 AM Qu Wenruo <wqu@xxxxxxxx> wrote: >>> >>> What a great subject. The "reflink:" part is unnecessary, since the >>> rest of the subject already mentions it, that makes it a bit shorter. >>> >>>> >>>> [BUG] >>>> The following command can lead to unexpected data COW: >>>> >>>> #!/bin/bash >>>> >>>> dev=/dev/test/test >>>> mnt=/mnt/btrfs >>>> >>>> mkfs.btrfs -f $dev -b 1G > /dev/null >>>> mount $dev $mnt -o nospace_cache >>>> >>>> xfs_io -f -c "falloc 8k 24k" -c "pwrite 12k 8k" $mnt/file1 >>>> xfs_io -c "reflink $mnt/file1 8k 0 4k" $mnt/file1 >>>> umount $dev >>>> >>>> The result extent will be >>>> >>>> item 7 key (257 EXTENT_DATA 4096) itemoff 15760 itemsize 53 >>>> generation 6 type 2 (prealloc) >>>> prealloc data disk byte 13631488 nr 28672 >>>> item 8 key (257 EXTENT_DATA 12288) itemoff 15707 itemsize 53 >>>> generation 6 type 1 (regular) >>>> extent data disk byte 13660160 nr 12288 <<< COW >>>> item 9 key (257 EXTENT_DATA 24576) itemoff 15654 itemsize 53 >>>> generation 6 type 2 (prealloc) >>>> prealloc data disk byte 13631488 nr 28672 >>>> >>>> Currently we always reserve space even for NOCOW buffered write, thus >>> >>> I would add 'data' between 'reserve' and 'space', to be clear. >>> >>>> under most case it shouldn't cause anything wrong even we fall back to >>>> COW. >>> >>> even we ... -> even if we fallback to COW when running delalloc / >>> starting writeback. >>> >>>> >>>> However when we're out of data space, we fall back to skip data space if >>>> we can do NOCOW write. >>> >>> we fall back to skip data space ... -> we fallback to NOCOW write >>> without reserving data space. >>> >>>> >>>> If such behavior happens under that case, we could hit the following >>>> problems: >>> >>>> - data space bytes_may_use underflow >>>> This will cause kernel warning. >>> >>> Ok. >>> >>>> >>>> - ENOSPC at delalloc time >>> >>> at delalloc time - that is an ambiguous term you use through the change log. >> >> In fact, I have a lot of uncertain terminology through kernel. >> >> Things like flush get referred multiple times in different context (e.g. >> filemap flush, flushoncommit, super block flush). >> >> If we have a terminology list, we can't be more happy to follow. > > So, some is kernel wide while others are btrfs specific. > > A buffered write creates dealloc - copies data to pages, marks the > pages as dirty and tags the range in the extent io tree as dellaloc. > Running delalloc, flushes writeback (starts IO for the dirty pages and > tags them as under writeback) and does other necessary things (like > reserving extents). > When writeback finishes, we add a task to a work queue to run > btrfs_finish_ordered_io - after that happens we say that the ordered > extent completed. > > It can get ambiguous very often. That's why I have created the following document which (tries) to explain this: https://github.com/btrfs/btrfs-dev-docs/blob/master/delalloc.txt It's not perfect but it's better than nothing, feel free to contribute improvements. < snip>
