On 2016-04-27 19:19, Chris Murphy wrote:
On Wed, Apr 27, 2016 at 5:22 AM, Austin S. Hemmelgarn
<ahferroin7@xxxxxxxxx> wrote:
On 2016-04-26 20:58, Chris Murphy wrote:
On Tue, Apr 26, 2016 at 5:44 AM, Juan Alberto Cirez
<jacirez@xxxxxxxxxxxxx> wrote:
With GlusterFS as a distributed volume, the files are already spread
among the servers causing file I/O to be spread fairly evenly among
them as well, thus probably providing the benefit one might expect
with stripe (RAID10).
Yes, the raid1 of Btrfs is just so you don't have to rebuild volumes
if you lose a drive. But since raid1 is not n-way copies, and only
means two copies, you don't really want the file systems getting that
big or you increase the chances of a double failure.
I've always though it'd be neat in a Btrfs + GlusterFS, if it were
possible for Btrfs to inform Gluster FS of "missing/corrupt" files,
and then for Btrfs to drop reference for those files, instead of
either rebuilding or remaining degraded. And then let GlusterFS deal
with replication of those files to maintain redundancy. i.e. the Btrfs
volumes would be single profile for data, and raid1 for metadata. When
there's n-way raid1, each drive can have a copy of the file system,
and it'd tolerate in effect n-1 drive failures and the file system
could at least still inform Gluster (or Ceph) of the missing data, the
file system still remains valid, only briefly degraded, and can still
be expanded when new drives become available.
FWIW, I _think_ this can be done with the scrubbing code in GlusterFS. It's
designed to repair data mismatches, but I'm not sure how it handles missing
copies of data. However, in the current state, there's no way without
external scripts to handle re-shaping of the storage bricks if part of them
fails.
Yeah I haven't tried doing a scrub, parsing dmesg for busted file
paths, and feeling those paths into rm to see what happens. Will they
get deleted without additional errors? If so good, then scrub again
should be clean. And then btrfs dev missing to get rid of the broken
device *and* cause missing metadata to be replicated again and now in
theory the fs should be back to normal. But it'd have to be tested
with a umount followed by mount to see if -o degraded is still
required.
I'm not entirely certain, although I had been planning on adding a test
to check this to my usual testing before the system I use for it went
offline, I just haven't had the time to get it working again. If I find
the time in the near future, I may just test it on my laptop in a VM.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html