Re: XFS on top RAID10 with odd drives count and 2 near copies

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[ ... ]

>> [ ... ] XFS write stripe alignment should be for a 7 disk
>> mdraid10 near layout array.

> [ ... ] stripe alignment matters ONLY AND SOLELY IF
> READ-MODIFY-WRITE is involved, and RADI10 never requires
> read-modify-write.

>> Do note that stripe width is specific to writes.  It has
>> nothing to do with reads, from the filesystem perspective
>> anyway. For internal array operations it will.

> Again, stripe alignment only matters for writes ONLY AND SOLELY
> IF READ-MODIFY-WRITE is involved. This never happens for RAID0,
> RAID1 or RAID10, because there is no parity to update; chunks
> within a stripe are wholly independent of each other.

There is a subtlety here... and at times I am excessively precise
in my wording :-) but also because it matters.

XFS requires specifying stripe geometry as 'su'/'sunit' which is
the MD chunk size, and as 'sw'/'swidth' which is the logical
stripe size, and similarly for 'ext3' and 'ext4'.

The reason is that while the *stripe* alignment (and size) don't
matter if there is no risk of RMW in the underlying storage
system, with XFS like 'ext3' and 'ext4' the *chunk* alignment
(and size) matters in all cases where there is parallelism in the
underlying storage system; it matters for both reads and writes,
and on all RAID layouts.

Because then filesystem will try to allocate _metadata_ to be
chunk aligned, so that reading/writing metadata can take
advantage of the parallelism of the array. From 'man mke2fs':

  "stride=stride-size
    Configure the filesystem for a RAID array with stride-size
    filesystem blocks. This is the number of blocks read or
    written to disk before moving to the next disk, which is
    sometimes referred to as the chunk size.
    This mostly affects placement of filesystem meta-data like
    bitmaps at mke2fs time to avoid placing them on a single
    disk, which can hurt performance. It may also be used by
    the block allocator."

But note that this is a different discussion from one about
*stripes*, and IO from applications above the filesystem. This is
the filesystem as an application itself optimizing its own data
given a hint about device geometry.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux