On 2017-09-01 09:54, Qu Wenruo wrote:
On 2017年09月01日 20:47, Austin S. Hemmelgarn wrote:
On 2017-09-01 08:19, Qu Wenruo wrote:
On 2017年09月01日 20:05, Austin S. Hemmelgarn wrote:
On 2017-09-01 07:49, Qu Wenruo wrote:
On 2017年09月01日 19:28, Austin S. Hemmelgarn wrote:
On 2017-08-31 20:13, Qu Wenruo wrote:
On 2017年09月01日 01:27, Goffredo Baroncelli wrote:
Hi All,
I found a bug in mkfs.btrfs, when it is used the option '-r'. It
seems that it is not visible the full disk.
Despite the new bug you found, -r has several existing bugs.
Is this actually a bug though? Every other filesystem creation
tool that I know of that offers functionality like this generates
the filesystem just large enough to contain the data you want in
it, so I would argue that making this use the whole device is
actually breaking consistency with other tools, not to mention
removing functionality that is useful (even aside from the system
image generation use case I mentioned, there are other practical
applications (seed 'device' generation comes to mind).
Well, then documentation bug.
And I'm not sure the chunk size is correct or optimized.
Even for btrfs-convert, which will make data chunks very scattered,
we still try to make a large chunk to cover scattered data extents.
For a one-shot or read-only filesystem though, a maximally sized
chunk is probably suboptimal.
Not exactly.
Current kernel (and btrfs-progs also tries to follow kernel chunk
allocator's behavior) will not make a chunk larger than 10% of RW space.
So for small filesystem chunk won't be too maximally sized.
Are you sure about this? I've got a couple of sub 10GB BTRFS volumes
that definitely have more than one 1GB data chunk.
Yes, check the following code:
/* we don't want a chunk larger than 10% of writeable space */
max_chunk_size = min(div_factor(fs_devices->total_rw_bytes, 1),
max_chunk_size);
Which is in __btrfs_alloc_chunk() function in fs/btrfs/volumes.c
Huh, I may have the remnants of an old bug present on those filesystems
then, I'll have to look further into this.
Suppose you use this to generate a base image for a system in the
form of a seed device. This actually ends up being a pretty easy
way to get factory reset functionality. It's also a case where you
want the base image to take up as little space as possible, so that
the end-user usable storage space is as much as possible. In that
case, if your base image doesn't need an exact multiple of 1GB for
data chunks, then using 1GB data chunks is not the best choice for
at least the final data chunk (because the rest of that 1GB gets
wasted). A similar argument applies for metadata.
Yes, your example makes sense. (despite of above 10% limit I mentioned).
The problem is, no one really knows how the image will be used.
Maybe it will be used as normal btrfs (with fi resize), or with your
purpose.
We can't save users from making poor choices. If we could, we
wouldn't have anywhere near as many e-mails on the list from people
who are trying to recover data from their broken filesystems because
they have no backups.
The only case I can find where '-r' is a win is when you need the
filesystem to be as small as possible with no free space. The moment
you need free space, it's actually faster to just create the
filesystem, resize it to the desired size, and then copy in your data
(I've actually benchmarked this, and while it's not _much_ difference
in time spent, there is a measurable difference, with my guess being
that the allocation code is doing more work in userspace than in the
kernel). At a minimum, I think it's probably worth documenting this
fact.
I still remember some time ago, other guys told me that the main
advantage of -r is we don't need root privilege to mount.
That's a good point I hadn't thought of. I'm used to working on single
user systems (where I can just trap out to root to do stuff like that
without any issue), or multi-user systems where I'm the admin (where I
can also trap out to root to do that kind of thing with limited issues).
Getting a working FUSE module for BTRFS could help with this too
though (and actually would probably not be hugely difficult, considering
that we have most of the FS specific code in userspace already as part
of btrfs-progs), but that's kind of beyond this discussion.
Anyway, documentation is important, but we need to first know the
correct or designed behavior of -r.
Agreed.
At least mkfs.ext4 -d option doesn't limit the size.
In my test, 1G file with mkfs.ext -d still shows about 900M+ available
space.
That may just be them choosing to use whatever size the device has.
They have limited incentive to do anything else, because genext2fs
exists and covers the minimal filesystem generation side of things.
For normal btrfs case, although it may not cause much problem, but it
will not be the optimized use case and may need extra manual balance.
Actually, until the first write to the filesystem, it will still be an
optimal layout. Once you start writing to any BTRFS filesystem that
has an optimal layout though, it immediately becomes non-optimal, and
there's not really anything we can do about that unless we allow
chunks that are already allocated to be resized on the fly (which is a
bad idea for multiple reasons).
At least to me, it's not the case for chunk created by -r option.
BTW, seed device is RO anyway, how much or how less spare space we
have is not a problem at all.
That really depends on how you look at it. Aside from the above
example, there's the rather specific question of why you would not
want to avoid wasting space. The filesystem is read-only, which
means that any 'free space' on that filesystem is completely
unusable, can't be reclaimed for anything else, and in general is
just a waste.
Still same problem above.
What if the seed device is de-attached and then be used as normal btrfs?
So to me, even follow other tools -r, we should follow the normal
extent allocator behavior to create data/metadata, and then set the
device size to end of its dev extents.
I don't entirely agree, but I think I've made my point well enough
above.
Yes, you did make your point clear, and I agree that use cases you
mentioned exist and wasted space also exists.
But since we don't really know what the image will be used, I prefer
to keep everything to use kernel (or btrfs-progs) chunk allocator to
make the behavior consistent.
So my point is more about consistent behavior of btrfs-progs and
kernel, and less maintenance.
(That's to say, my goal for mkfs.btrfs -r is just to do mkfs, mount,
cp without privilege)
Perhaps we could add some tool then to take a BTRFS filesystem and
restructure it to have an optimal layout? On first examination, the
resize command actually sounds like a reasonable place to do this,
possibly add a 'min' keyword (similar to 'max') that can also adjust
chunk sizes to get the smallest possible filesystem. The biggest
thing I'm worried about here is that there are numerous use cases for
optimal filesystems of minimal size, and changing the behavior of the
-r option will remove the only currently available way to get such
filesystems.
Yes, when we're going to cover all possible cases, we're doomed.
So I'll just make it as simple as possible for now.
If some one really wants to do that, resize subcommand seems to be a
good place to start.
In the short term, perhaps we could add an option to mkfs.btrfs to use
the current '-r' behavior.
Long term, I probably will look into adding something to do this to the
resize command (because I can think of a few other cases where it's
useful to have something like this), but it will probably be next year
before I have a large enough block of free time to sit down and work on
this.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html