On 2020/4/5 下午4:26, Goffredo Baroncelli wrote: > > Hi all, > > This is an RFC; I wrote this patch because I find the idea interesting > even though it adds more complication to the chunk allocator. > > The core idea is to store the metadata on the ssd and to leave the data > on the rotational disks. BTRFS looks at the rotational flags to > understand the kind of disks. > > This new mode is enabled passing the option ssd_metadata at mount time. > This policy of allocation is the "preferred" one. If this doesn't permit > a chunk allocation, the "classic" one is used. One thing to improve here, in fact we can use existing members to restore the device related info: - btrfs_dev_item::seek_speed - btrfs_dev_item::bandwidth (I tend to rename it to IOPS) In fact, what you're trying to do is to provide a policy to allocate chunks based on each device performance characteristics. I believe it would be super awesome, but to get it upstream, I guess we would prefer a more flex framework, thus it would be pretty slow to merge. But still, thanks for your awesome idea. Thanks, Qu > > Some examples: (/dev/sd[abc] are ssd, and /dev/sd[ef] are rotational) > > Non striped profile: metadata->raid1, data->raid1 > The data is stored on /dev/sd[ef], metadata is stored on /dev/sd[abc]. > When /dev/sd[ef] are full, then the data chunk is allocated also on > /dev/sd[abc]. > > Striped profile: metadata->raid6, data->raid6 > raid6 requires 3 disks at minimum, so /dev/sd[ef] are not enough for a > data profile raid6. To allow a data chunk allocation, the data profile raid6 > will be stored on all the disks /dev/sd[abcdef]. > Instead the metadata profile raid6 will be allocated on /dev/sd[abc], > because these are enough to host this chunk. > > Changelog: > v1: - first issue > v2: - rebased to v5.6.2 > - correct the comparison about the rotational disks (>= instead of >) > - add the flag rotational to the struct btrfs_device_info to > simplify the comparison function (btrfs_cmp_device_info*() ) > v3: - correct the collision between BTRFS_MOUNT_DISCARD_ASYNC and > BTRFS_MOUNT_SSD_METADATA. > > Below I collected some data to highlight the performance increment. > > Test setup: > I performed as test a "dist-upgrade" of a Debian from stretch to buster. > The test consisted in an image of a Debian stretch[1] with the packages > needed under /var/cache/apt/archives/ (so no networking was involved). > For each test I formatted the filesystem from scratch, un-tar-red the > image and the ran "apt-get dist-upgrade" [2]. For each disk(s)/filesystem > combination I measured the time of apt dist-upgrade with and > without the flag "force-unsafe-io" which reduce the using of sync(2) and > flush(2). The ssd was 20GB big, the hdd was 230GB big, > > I considered the following scenarios: > - btrfs over ssd > - btrfs over ssd + hdd with my patch enabled > - btrfs over bcache over hdd+ssd > - btrfs over hdd (very, very slow....) > - ext4 over ssd > - ext4 over hdd > > The test machine was an "AMD A6-6400K" with 4GB of ram, where 3GB was used > as cache/buff. > > Data analysis: > > Of course btrfs is slower than ext4 when a lot of sync/flush are involved. Using > apt on a rotational was a dramatic experience. And IMHO this should be replaced > by using the btrfs snapshot capabilities. But this is another (not easy) story. > > Unsurprising bcache performs better than my patch. But this is an expected > result because it can cache also the data chunk (the read can goes directly to > the ssd). bcache perform about +60% slower when there are a lot of sync/flush > and only +20% in the other case. > > Regarding the test with force-unsafe-io (fewer sync/flush), my patch reduce the > time from +256% to +113% than the hdd-only . Which I consider a good > results considering how small is the patch. > > > Raw data: > The data below is the "real" time (as return by the time command) consumed by > apt > > > Test description real (mmm:ss) Delta % > -------------------- ------------- ------- > btrfs hdd w/sync 142:38 +533% > btrfs ssd+hdd w/sync 81:04 +260% > ext4 hdd w/sync 52:39 +134% > btrfs bcache w/sync 35:59 +60% > btrfs ssd w/sync 22:31 reference > ext4 ssd w/sync 12:19 -45% > > > > Test description real (mmm:ss) Delta % > -------------------- ------------- ------- > btrfs hdd 56:2 +256% > ext4 hdd 51:32 +228% > btrfs ssd+hdd 33:30 +113% > btrfs bcache 18:57 +20% > btrfs ssd 15:44 reference > ext4 ssd 11:49 -25% > > > [1] I created the image, using "debootrap stretch", then I installed a set > of packages using the commands: > > # debootstrap stretch test/ > # chroot test/ > # mount -t proc proc proc > # mount -t sysfs sys sys > # apt --option=Dpkg::Options::=--force-confold \ > --option=Dpkg::options::=--force-unsafe-io \ > install mate-desktop-environment* xserver-xorg vim \ > task-kde-desktop task-gnome-desktop > > Then updated the release from stretch to buster changing the file /etc/apt/source.list > Then I download the packages for the dist upgrade: > > # apt-get update > # apt-get --download-only dist-upgrade > > Then I create a tar of this image. > Before the dist upgrading the space used was about 7GB of space with 2281 > packages. After the dist-upgrade, the space used was 9GB with 2870 packages. > The upgrade installed/updated about 2251 packages. > > > [2] The command was a bit more complex, to avoid an interactive session > > # mkfs.btrfs -m single -d single /dev/sdX > # mount /dev/sdX test/ > # cd test > # time tar xzf ../image.tgz > # chroot . > # mount -t proc proc proc > # mount -t sysfs sys sys > # export DEBIAN_FRONTEND=noninteractive > # time apt-get -y --option=Dpkg::Options::=--force-confold \ > --option=Dpkg::options::=--force-unsafe-io dist-upgrade > > > BR > G.Baroncelli >
