On Mon, Jul 06, 2015 at 06:22:52PM +0200, Johannes Pfrang wrote: > Cross-posting my unix.stackexchange.com question[1] to the btrfs list > (slightly modified): > > [1] > https://unix.stackexchange.com/questions/214009/btrfs-distribute-files-equally-across-multiple-devices > > --------------------------------------------------------------------------------- > > I have a btrfs volume across two devices that has metadata RAID 1 and > data RAID 0. AFAIK, in the event one drive would fail, practically all > files above the 64KB default stripe size would be corrupted. As this > partition isn't performance critical, but should be space-efficient, > I've thought about re-balancing the filesystem to distribute files > equally across disks, but something like that doesn't seem to exist. The > ultimate goal would be to be able to still read some of the files in the > event of a drive failure. > > AFAIK, using "single"/linear data allocation just fills up drives one by > one (at least that's what the wiki says). Not quite. In single mode, the FS will allocate linear chunks of space 1 GiB in size, and use those to write into (fitting many files into each chunk, potentially). The chunks are allocated as needed, and will go on the device with the most unallocated space. So, with equal-sized devices, the first 1 GiB will go on the first device, the second 1 GiB on the second device, and so on. With unequal devices, you'll put data on the largest device, until its free space reaches the size of the next largest, and then the chunks will be alternated between those two, until the free space on each of the two largest reaches the size of the third-largest, and so on. (e.g. for devices sized 6 TB, 4 TB, 3 TB, the first 2 TB will go exclusively on the first device; the next 2 TB will go on the first two devices, alternating in 1 GB chunks; the rest goes across all three devices, again, alternating in 1 GB chunks.) This is all very well for an append-only filesystem, but if you're changing the files on the FS at all, there's no guarantee as to where the changed extents will end up -- not even on the same device, let alone close to the rest of the file on the platter. I did work out, some time ago, a prototype chunk allocator (the 1 GiB-scale allocations) that would allow enough flexibility to control where the next chunk to be allocated would go. However, that still leaves the extent allocator to deal with, which is the second, and much harder, part of the problem. Basically, don't assume any kind of structure to the location of your data on the devices you have, and keep good, tested, regular backups of anything you can't stand to lose and can't replace. There are no guarantees that would let you assume easily that any one file is on a single device, or that anything would survive the loss of a device. I'm sure this is an FAQ entry somewhere... It's come up enough times. Hugo. > The simplest implementation would probably be something like: Always > write files to the disk with the least amount of space used. I think > this may be a valid software-raid use-case, as it combines RAID 0 (w/o > some of the performance gains[2]) with recoverability of about half of > the data/files (balanced by filled space or amount of files) in the > event of a drive-failure[3] by using filesystem information a > hardware-raid doesn't have. In the end this is more or less JBOD with > balanced disk usage + filesystem intelligence. > > Is there something like that already in btrfs or could this be something > the btrfs-devs would consider? > > > [2] Still can read/write multiple files from/to different disks, so less > performance only for "single-file-reads/writes" > [3] using two disks, otherwise (totalDisks-failedDisks)/totalDisks -- Hugo Mills | "How deep will this sub go?" hugo@... carfax.org.uk | "Oh, she'll go all the way to the bottom if we don't http://carfax.org.uk/ | stop her." PGP: E2AB1DE4 | U571
Attachment:
signature.asc
Description: Digital signature
