For a large portion of desktop users that are not developers and are
rustlang illiterate and programming illiterate; they would not now
whether this tool or that tool or any tool would be safe, or unsafe, or
have concurrent race conditions, or know the meaning of immutable or mutex.
Think of this scenario; average Joe Bloggs user buys new computer
without MS Windows. With the software savings, Joe purchases more
disks. He then chooses openSuse Leap for his first foray into Linux.
All he cares about are his music files, photos, and videos being safe.
Joe runs a Cafe down the street and uses the music, photos, and videos
in various screens at his cafe for the atmosphere.
Times are tough and he's running out of space so he doesn't want the
accumulate media files duplicated all around the place wasting space to
conserve storage.
If the official wikis have broken 3rd party tools, then it makes the
whole adoption process less easy, less friendly, very cryptic, more
chaotic; and give the impression that btrfs is a mess and not ready (and
Linux as a whole). He would not know or have the time to go through the
code of each deduplication program tool option to figure out if one type
or the other type is better just like Zygo Blaxell did who can read
code. Even if he wanted to, he doesn't know how to nor has the time to
do it. He says good-bye to openSuse and buys Windows.
So I do agree with waxhead. It would be preferable if there were an
official btrfs deduplication command from btrfs-progs instead of relying
on 3rd parties. Joe Bloggs example above can read a web-page
instructions saying "run this command... and then this command..."; but
he will not have the knowledge, nor comprehension nor time to go through
code.
Thanks David Sterba for removing the items and updating the wiki!
On 19/6/20 6:43 am, Zygo Blaxell wrote:
The point about lack of maintenance with changing Rust dependencies is
fair, but "data loss" is a strong and unsupported statement. Can you
explain how data loss could occur in even a badly (assume not maliciously)
broken version of btrfs-dedupe?
As far as I can tell, the btrfs-dedupe code uses only non-data-mutating
btrfs kernel interfaces for manipulating extents (fiemap, defrag,
and file_extent_same/deduperange). None of these should cause data
loss (excluding kernel bugs).
btrfs-dedupe can be trivially tricked into opening files that it did
not intend to (it has no protection against symlink injection and other
TOCCTOU attacks), but it doesn't seem to be able to alter the content
of files once it opens them.
File descriptors pointing to user files are opened O_RDWR, but they are
kept in the scope of the dedupe function and their life-cycle is properly
managed in Rust, so btrfs-dedupe won't mutate files by writing to the
wrong fd (e.g. accidentally close stderr and reopen it to a user file)
unless someone adds some seriously buggy code (see "assume not malicious"
above).
The unsafe C ioctl interfaces are unlikely to change in data-losing ways,
or they'll break all existing userspace tools that use them. They are
also well encapsulated in the rust-btrfs module.
The errors reported on github seem to be problems with incompatible
changes in the runtime libraries btrfs-dedupe depends on, and also some
reports of what look like pre-existing bugs in the fiemap code that are
blamed on new kernel versions without evidence. Data-losing breaking
changes in any of the ioctls btrfs-dedupe uses are extremely unlikely.
Those issues may cause btrfs-dedupe to do useless unnecessary work,
or fail to do useful necessary work, but could not cause data loss by
any mechanism I can find.
Contrast with bedup: bedup uses data-mutating kernel interfaces
(clone_range) for dedupe that have no effective protection against
concurrent data modification. There is ineffective protection implemented
in bedup (looking in /proc/*/fd for concurrent users of the files) which
may or may not be broken in kernel 5.0, but it's ineffective either way.
The case for data loss in bedup is trivial. The branch with a patch to
fix it is now 7 years old, so it's fair to say bedup is unmaintained too
(github forks notwithstanding, they didn't fix these issues).