On 16/02/2019 06:39, Dave Chinner wrote:
[..]
>> We've supported this since mid 2018 and commit ba23cba9b3bd ("fs:
>> allow per-device dax status checking for filesystems"). That is,
>> we can have DAX on the XFS RT device indepently of the data device.
>>
>> That is, you set up pmem in three segments - two small identical
>> segments start get mirrored with RAID1 as the data device, and
>> the remainder as a block device that is dax capable set up as the
>> XFS realtime device. Set the RTINHERIT bit on the root directory at
>> mkfs time ("-d rtinherit=1") and then all the data goes to the DAX
>> capable realtime device, and all the metadata goes to the software
>> raided pmem block devices that aren't DAX capable.
>>
>> Problem already solved, yes?
>
> Sorry, this was meant to be a reply to Dan's email commenting about
> some people needing mirrored metadata, not the parent that was
> talking about whole device RAID...
>
> i.e. mirrored metadata w/ FS-DAX for data should already be a solved
> problem...
Trying to answer you both.
But deferring the data redundancy to the application sounds like a no-go
to me, sorry. We don't do that for "traditional" block storage (SCSI,
NVMe, you name it). Some applications might already be able to handle it
but definitively not all. I don't see your random DBMS like MariaDB or
Postgres already doing data duplication over interleave sets of NV-DIMMs.
And if you carve out a bit of your pmem space into an own namespace for
the metadata (did I understand you right here?) you still have the
problem that all data written to the DIMMs is interleaved in an
interleave set, if I understand it correctly.
So if one DIMM in your interleave set goes bad, you're lost anyways.
Byte,
Johannes
--
Johannes Thumshirn SUSE Labs Filesystems
jthumshirn@xxxxxxx +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850