Re: [RFC][PATCH V3] btrfs: ssd_metadata: storing metadata on SSD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 06, 2020 at 07:33:16PM +0200, Goffredo Baroncelli wrote:
> On 4/6/20 7:21 PM, Zygo Blaxell wrote:
> > On Mon, Apr 06, 2020 at 06:43:04PM +0200, Goffredo Baroncelli wrote:
> > > On 4/6/20 4:24 AM, Zygo Blaxell wrote:
> > > > > > Of course btrfs is slower than ext4 when a lot of sync/flush are involved. Using
> > > > > > apt on a rotational was a dramatic experience. And IMHO  this should be replaced
> > > > > > by using the btrfs snapshot capabilities. But this is another (not easy) story.
> > > > flushoncommit and eatmydata work reasonably well...once you patch out the
> > > > noise warnings from fs-writeback.
> > > > 
> > > 
> > > You wrote flushoncommit, but did you mean "noflushoncommit" ?
> > 
> > No.  "noflushoncommit" means applications have to call fsync() all the
> > time, or their files get trashed on a crash.  I meant flushoncommit
> > and eatmydata.
> 
> It is a tristate value (default, flushoncommit, noflushoncommit), or
> flushoncommit IS the default ?

noflushoncommit is the default.  flushoncommit is sort of terrible--it
used to have deadlock bugs up to 4.15, and spams the kernel log with
warnings since 4.15.

> > While dpkg runs, it must never call fsync, or it breaks the write
> > ordering provided by flushoncommit (or you have to zero-log on boot).
> > btrfs effectively does a point-in-time snapshot at every commit interval.
> > dpkg's ordering of write operations and renames does the rest.
> > 
> > dpkg runs much faster, so the window for interruption is smaller, and
> > if it is interrupted, then the result is more or less the same as if
> > you had run with fsync() on noflushoncommit.  The difference is that
> > the filesystem might roll back to an earlier state after a crash, which
> > could be a problem e.g. if your maintainer scripts are manipulating data
> > on multiple filesystems.
> > 
> > 
> > > Regarding eatmydata, I used it too. However I was never happy. Below my script:
> > > ----------------------------------
> > > ghigo@venice:/etc/apt/apt.conf.d$ cat 10btrfs.conf
> > > 
> > > DPkg::Pre-Invoke {"bash /var/btrfs/btrfs-apt.sh snapshot";};
> > > DPkg::Post-Invoke {"bash /var/btrfs/btrfs-apt.sh clean";};
> > > Dpkg::options {"--force-unsafe-io";};
> > > ---------------------------------
> > > ghigo@venice:/etc/apt/apt.conf.d$ cat /var/btrfs/btrfs-apt.sh
> > > 
> > > btrfsroot=/var/btrfs/debian
> > > btrfsrollback=/var/btrfs/debian-rollback
> > > 
> > > 
> > > do_snapshot() {
> > > 	if [ -d "$btrfsrollback" ]; then
> > > 		btrfs subvolume delete "$btrfsrollback"
> > > 	fi
> > > 
> > > 	i=20
> > > 	while [ $i -gt 0 -a -d "$btrfsrollback" ]; do
> > > 		i=$(( $i + 1 ))
> > > 		sleep 0.1
> > > 	done
> > > 	if [ $i -eq 0 ]; then
> > > 		exit 100
> > > 	fi
> > > 
> > > 	btrfs subvolume snapshot "$btrfsroot" "$btrfsrollback"
> > > 	
> > > }
> > > 
> > > do_removerollback() {
> > > 	if [ -d "$btrfsrollback" ]; then
> > > 		btrfs subvolume delete "$btrfsrollback"
> > > 	fi
> > > }
> > > 
> > > if [ "$1" = "snapshot" ]; then
> > > 	do_snapshot
> > > elif [ "$1" = "clean" ]; then
> > > 	do_removerollback
> > > else
> > > 	echo "usage: $0  snapshot|clean"
> > > fi
> > > --------------------------------------------------------------
> > > 
> > > Suggestion are welcome how detect automatically where is mount the
> > > btrfs root (subvolume=/) and  my root subvolume name (debian in my
> > > case). So I will avoid to wrote directly in my script.
> > 
> > You can figure out where "/" is within a btrfs filesystem by recusively
> > looking up parent subvol IDs with TREE_SEARCH_V2 until you get to 5
> > FS_ROOT (sort of like the way pwd works on traditional Unix); however,
> > root can be a bind mount, so "path from fs_root to /" is not guaranteed
> > to end at a subvol root.
> 
> May be an use case for a new ioctl :-) ? Snapshot a subvolume without
> mounting the root subvolume....

That would make access control mechanisms like chroot...challenging.
;)  But I hear we have a delete-by-id ioctl now, so might as well have
snap-by-id too.

> > Also, sometimes people put /var on its own subvol, so you'd need to
> > find "the set of all subvols relevant to dpkg" and that's definitely
> > not trivial in the general case.
> 
> I know that a general rule it is not easy. Anyway I also would put /boot
> and /home in a dedicated subvolume.
> If the "roolback" is done at boot, /boot should be an invariant...
> However I think that there are a lot of corner case even here (what happens
> if the boot kernel doesn't have modules in the root subvolume ?)
> 
> It is not an easy job. It must be performed at distribution level...
> 
> > 
> > It's not as easy to figure out if there's an existing fs_root mount
> > point (partly because namespacing mangles every path in /proc/mounts
> > and mountinfo), but if you know the btrfs device (and can access it
> > from your namespace) you can just mount it somewhere and then you do
> > know where it is.
> 
> I agree, looking from root to the "root device", then mount the
> root subvolume in a know place, where it is possible to snapshot
> the root subvolume.
> 
> > 
> > > BR
> > > G.Baroncelli
> > > -- 
> > > gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
> > > Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
> > > 
> 
> 
> -- 
> gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
> 



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux