HI, You should check the patchset about VFS hot tracking https://lwn.net/Articles/550495/ On Thu, May 16, 2013 at 3:12 PM, Kai Krakow <hurikhan77+btrfs@xxxxxxxxx> wrote: > Hi! > > I think such a solution as part of the filesystem could do much better than > something outside of it (like bcache). But I'm not sure: What makes data > hot? I think the most benefit is detecting random read access and mark only > those data as hot, also writes should go to the SSD first and then should be > spooled to the harddisks in background. Bcache does a lot regarding this. > > Since this is within the filesystem, users could even mark files as being > always "hot" with some attribute or ioctl. This could be used by a boot- > readahead and preload implementation to automatically make files hot used > during booting or for preloading when I start an application. > > On the other side hot relocation should be able to reduce writes to the SSD > as good as possible, for example: Do not defragment files during autodefrag, > it makes no sense. Also write data in bursts of erase block size etc. > > And also important: What if the SSD dies due to wearing? Will it gracefully > fall back to harddisk? What does "relocation" mean? Files (hot data) should > only be cached in copy to SSD, and not moved there. It should be possible > for btrfs to just drop a failing SSD from the filesystem without data loss > because otherwise one should use two SSDs in raid-1 mode to get a safe cache > storage. > > Altogether I think that a spinning media btrfs raid can outperform a single > SSD so hot relocation should probably be used to reduce head movements > because this is where SSD really excels. So everything that involves heavy > head movement should go to SSD first, then written back to harddisk. And I > think there's a lot potential to optimize because a COW filesystem like > btrfs naturally has a lot of head movement. > > What do you think? > > BTW: I have not tried the one or the other yet because I'm still deciding > which way to go. Your patches are more welcome because I do not need to > migrate my storage to bcache-provided block devices. OTOH the bcache > implementation looks a lot more mature (with regard to performance and > safety) at this point because it provides many of the above mentioned > features - most importantly gracefully handling failing SSDs. > > Regarding btrfs raid outperforms SSD: During boot my spinning media 3 device > btrfs raid reads boot files with up to 600 MB/s (from LZ compressed fs), > boot takes about 7 seconds until the display manager starts (which takes > another 30 seconds but that's another story), and the system is pretty > crowded with services I actually wouldn't need if I optimized for boot > performance. But I think systemd's read-ahead implementation has a lot > influence on this fast booting: It defragments and relocates boot files on > btrfs during boot so the harddisks can sequentially read all this stuff. I > think it also compresses boot files if compression is enabled because > booting is IO bound, not CPU bound. Benchmarks showed that my btrfs raid > could technically read up to 450 MB/s, so I think the 600 MB/s counts for > decompressed data. A single SSD could not do that. For that same reason I > created a small script to defragment and compress files used by the preload > daemon. Without benchmarking it, this felt like another small performance > boost. So I'm eager what could be next with some sort of SSD cache because > the only problem left seems to be heavy head movement which slows down the > system. > > Zhi Yong Wu <zwu.kernel@xxxxxxxxx> schrieb: > >> HI, >> >> What do you think if its design approach goes correctly? Do you >> have any comments or better design idea for BTRFS hot relocation >> support? any comments are appreciated, thanks. >> >> >> On Mon, May 6, 2013 at 4:53 PM, <zwu.kernel@xxxxxxxxx> wrote: >>> From: Zhi Yong Wu <wuzhy@xxxxxxxxxxxxxxxxxx> >>> >>> The patchset as RFC is sent out mainly to see if it goes in the >>> correct development direction. >>> >>> The patchset is trying to introduce hot relocation support >>> for BTRFS. In hybrid storage environment, when the data in >>> HDD disk get hot, it can be relocated to SSD disk by BTRFS >>> hot relocation support automatically; also, if SSD disk ratio >>> exceed its upper threshold, the data which get cold can be >>> looked up and relocated to HDD disk to make more space in SSD >>> disk at first, and then the data which get hot will be relocated >>> to SSD disk automatically. >>> >>> BTRFS hot relocation mainly reserve block space from SSD disk >>> at first, load the hot data to page cache from HDD, allocate >>> block space from SSD disk, and finally write the data to SSD disk. >>> >>> If you'd like to play with it, pls pull the patchset from >>> my git on github: >>> https://github.com/wuzhy/kernel.git hot_reloc >>> >>> For how to use, please refer too the example below: >>> >>> root@debian-i386:~# echo 0 > /sys/block/vdc/queue/rotational >>> ^^^ Above command will hack /dev/vdc to be one SSD disk >>> root@debian-i386:~# echo 999999 > /proc/sys/fs/hot-age-interval >>> root@debian-i386:~# echo 10 > /proc/sys/fs/hot-update-interval >>> root@debian-i386:~# echo 10 > /proc/sys/fs/hot-reloc-interval >>> root@debian-i386:~# mkfs.btrfs -d single -m single -h /dev/vdb /dev/vdc >>> -f >>> >>> WARNING! - Btrfs v0.20-rc1-254-gb0136aa-dirty IS EXPERIMENTAL >>> WARNING! - see http://btrfs.wiki.kernel.org before using >>> >>> [ 140.279011] device fsid c563a6dc-f192-41a9-9fe1-5a3aa01f5e4c devid 1 >>> [ transid 16 /dev/vdb 140.283650] device fsid >>> [ c563a6dc-f192-41a9-9fe1-5a3aa01f5e4c devid 2 transid 16 /dev/vdc >>> [ 140.517089] device fsid 197d47a7-b9cd-46a8-9360-eb087b119424 devid 1 >>> [ transid 3 /dev/vdb 140.550759] device fsid >>> [ 197d47a7-b9cd-46a8-9360-eb087b119424 devid 1 transid 3 /dev/vdb >>> [ 140.552473] device fsid c563a6dc-f192-41a9-9fe1-5a3aa01f5e4c devid 2 >>> [ transid 16 /dev/vdc >>> adding device /dev/vdc id 2 >>> [ 140.636215] device fsid 197d47a7-b9cd-46a8-9360-eb087b119424 devid 2 >>> [ transid 3 /dev/vdc >>> fs created label (null) on /dev/vdb >>> nodesize 4096 leafsize 4096 sectorsize 4096 size 14.65GB >>> Btrfs v0.20-rc1-254-gb0136aa-dirty >>> root@debian-i386:~# mount -o hot_move /dev/vdb /data2 >>> [ 144.855471] device fsid 197d47a7-b9cd-46a8-9360-eb087b119424 devid 1 >>> [ transid 6 /dev/vdb 144.870444] btrfs: disk space caching is enabled >>> [ 144.904214] VFS: Turning on hot data tracking >>> root@debian-i386:~# dd if=/dev/zero of=/data2/test1 bs=1M count=2048 >>> 2048+0 records in >>> 2048+0 records out >>> 2147483648 bytes (2.1 GB) copied, 23.4948 s, 91.4 MB/s >>> root@debian-i386:~# df -h >>> Filesystem Size Used Avail Use% Mounted on >>> /dev/vda1 16G 13G 2.2G 86% / >>> tmpfs 4.8G 0 4.8G 0% /lib/init/rw >>> udev 10M 176K 9.9M 2% /dev >>> tmpfs 4.8G 0 4.8G 0% /dev/shm >>> /dev/vdb 15G 2.0G 13G 14% /data2 >>> root@debian-i386:~# btrfs fi df /data2 >>> Data: total=3.01GB, used=2.00GB >>> System: total=4.00MB, used=4.00KB >>> Metadata: total=8.00MB, used=2.19MB >>> Data_SSD: total=8.00MB, used=0.00 >>> root@debian-i386:~# echo 108 > /proc/sys/fs/hot-reloc-threshold >>> ^^^ Above command will start HOT RLEOCATE, because The data temperature >>> is currently 109 root@debian-i386:~# df -h >>> Filesystem Size Used Avail Use% Mounted on >>> /dev/vda1 16G 13G 2.2G 86% / >>> tmpfs 4.8G 0 4.8G 0% /lib/init/rw >>> udev 10M 176K 9.9M 2% /dev >>> tmpfs 4.8G 0 4.8G 0% /dev/shm >>> /dev/vdb 15G 2.1G 13G 14% /data2 >>> root@debian-i386:~# btrfs fi df /data2 >>> Data: total=3.01GB, used=6.25MB >>> System: total=4.00MB, used=4.00KB >>> Metadata: total=8.00MB, used=2.26MB >>> Data_SSD: total=2.01GB, used=2.00GB >>> root@debian-i386:~# >>> >>> Zhi Yong Wu (5): >>> vfs: add one list_head field >>> btrfs: add one new block group >>> btrfs: add one hot relocation kthread >>> procfs: add three proc interfaces >>> btrfs: add hot relocation support >>> >>> fs/btrfs/Makefile | 3 +- >>> fs/btrfs/ctree.h | 26 +- >>> fs/btrfs/extent-tree.c | 107 +++++- >>> fs/btrfs/extent_io.c | 31 +- >>> fs/btrfs/extent_io.h | 4 + >>> fs/btrfs/file.c | 36 +- >>> fs/btrfs/hot_relocate.c | 802 >>> +++++++++++++++++++++++++++++++++++++++++++ >>> fs/btrfs/hot_relocate.h | 48 +++ >>> fs/btrfs/inode-map.c | 13 +- >>> fs/btrfs/inode.c | 92 ++++- >>> fs/btrfs/ioctl.c | 23 +- >>> fs/btrfs/relocation.c | 14 +- >>> fs/btrfs/super.c | 30 +- >>> fs/btrfs/volumes.c | 28 +- >>> fs/hot_tracking.c | 1 + >>> include/linux/btrfs.h | 4 + >>> include/linux/hot_tracking.h | 1 + >>> kernel/sysctl.c | 22 ++ >>> 18 files changed, 1234 insertions(+), 51 deletions(-) >>> create mode 100644 fs/btrfs/hot_relocate.c >>> create mode 100644 fs/btrfs/hot_relocate.h >>> >>> -- >>> 1.7.11.7 >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Regards, Zhi Yong Wu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
