On 1 Nov 2018, at 6:21, ethanlien wrote: > Nikolay Borisov 於 2018-11-01 18:01 寫到: >> On 1.11.18 г. 11:56 ч., ethanlien wrote: >>> Nikolay Borisov 於 2018-11-01 16:59 寫到: >>>> On 1.11.18 г. 8:49 ч., Ethan Lien wrote: >>>>> Snapshot is expected to be fast. But if there are writers steadily >>>>> create dirty pages in our subvolume, the snapshot may take a very >>>>> long >>>>> time to complete. To fix the problem, we use tagged writepage for >>>>> snapshot >>>>> flusher as we do in the generic write_cache_pages(), so we can >>>>> ommit >>>>> pages >>>>> dirtied after the snapshot command. >>>> >>>> So the gist of this commit really is that you detect when >>>> filemap_flush >>>> has been called from snapshot context and tag all pages at *that* >>>> time >>>> as PAGECACHE_TAG_TOWRITE and then write them, ignoring any pages >>>> that >>>> might have been dirtied beforehand. Your description kind of dances >>>> around this idea without really saying it explicitly. Those >>>> semantics >>>> make sense, however i'd like to be stated more explicitly in the >>>> change >>>> log. >>>> >>>> However, this is done at the expense of consistency, so we have >>>> faster >>>> snapshots but depending the file which are stored in them they >>>> might be >>>> broken (i.e a database) since they will be missing pages. >>>> >>> >>> tag_pages_for_writeback() will tag all pages with >>> PAGECACHE_TAG_DIRTY. >>> If a dirty >>> page get missed here, it means someone has initiated the flush >>> before >>> us, so we >>> will wait that dirty page to complete in create_snapshot() -> >>> btrfs_wait_ordered_extents(). >> >> And what happens in the scenario where you have someone dirtying >> pages, >> you issue the snapshot ioctl, mark all currently dirtied pages as >> TOWRITE and then you have more delalloc pages being dirtied following >> initial call to tag_pages_for_writeback , you will miss those, no ? >> > > We don't freeze the filesystem when doing snapshot, so there is no > guarantee > the page dirtied right after we send the ioctl, will be included in > snapshot. > If a page is dirtied while we scan the dirty pages and its offset is > before our > current index, we miss it in our current snapshot implementation too. This looks like a pretty good compromise to me between btrfs v0's don't flush at all on snapshot and today's full sync on snapshot. The full sync is always going to be a little racey wrt concurrent writes. -chris
