On Mon, 2019-08-05 at 09:43 +1000, Dave Chinner wrote:
> On Fri, Aug 02, 2019 at 05:00:45PM -0500, Goldwyn Rodrigues wrote:
> > From: Goldwyn Rodrigues <rgoldwyn@xxxxxxxx>
> >
> > This helps filesystems to perform tasks on the bio while
> > submitting for I/O. Since btrfs requires the position
> > we are working on, pass pos to iomap_dio_submit_bio()
> >
> > The correct place for submit_io() is not page_ops. Would it
> > better to rename the structure to something like iomap_io_ops
> > or put it directly under struct iomap?
> >
> > Signed-off-by: Goldwyn Rodrigues <rgoldwyn@xxxxxxxx>
> > ---
> > fs/iomap/direct-io.c | 16 +++++++++++-----
> > include/linux/iomap.h | 1 +
> > 2 files changed, 12 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > index 5279029c7a3c..a802e66bf11f 100644
> > --- a/fs/iomap/direct-io.c
> > +++ b/fs/iomap/direct-io.c
> > @@ -59,7 +59,7 @@ int iomap_dio_iopoll(struct kiocb *kiocb, bool
> > spin)
> > EXPORT_SYMBOL_GPL(iomap_dio_iopoll);
> >
> > static void iomap_dio_submit_bio(struct iomap_dio *dio, struct
> > iomap *iomap,
> > - struct bio *bio)
> > + struct bio *bio, loff_t pos)
> > {
> > atomic_inc(&dio->ref);
> >
> > @@ -67,7 +67,13 @@ static void iomap_dio_submit_bio(struct
> > iomap_dio *dio, struct iomap *iomap,
> > bio_set_polled(bio, dio->iocb);
> >
> > dio->submit.last_queue = bdev_get_queue(iomap->bdev);
> > - dio->submit.cookie = submit_bio(bio);
> > + if (iomap->page_ops && iomap->page_ops->submit_io) {
> > + iomap->page_ops->submit_io(bio, file_inode(dio-
> > >iocb->ki_filp),
> > + pos);
> > + dio->submit.cookie = BLK_QC_T_NONE;
> > + } else {
> > + dio->submit.cookie = submit_bio(bio);
> > + }
>
> I don't really like this at all. Apart from the fact it doesn't work
> with block device polling (RWF_HIPRI), the iomap architecture is
That can be added, no? Should be relayed when we clone the bio.
> supposed to resolve the file offset -> block device + LBA mapping
> completely up front and so all that remains to be done is build and
> submit the bio(s) to the block device.
>
> What I see here is a hack to work around the fact that btrfs has
> implemented both file data transformations and device mapping layer
> functionality as a filesystem layer between file data bio building
> and device bio submission. And as the btrfs file data mapping
> (->iomap_begin) is completely unaware that there is further block
> mapping to be done before block device bio submission, any generic
> code that btrfs uses requires special IO submission hooks rather
> than just calling submit_bio().
>
> I'm not 100% sure what the solution here is, but the one thing we
> must resist is turning the iomap code into a mess of custom hooks
> that only one filesystem uses. We've been taught this lesson time
> and time again - the iomap infrastructure exists because stuff like
> bufferheads and the old direct IO code ended up so full of special
> case code that it ossified and became unmodifiable and
> unmaintainable.
>
> We do not want to go down that path again.
>
> IMO, the iomap IO model needs to be restructured to support post-IO
> and pre-IO data verification/calculation/transformation operations
> so all the work that needs to be done at the inode/offset context
> level can be done in the iomap path before bio submission/after
> bio completion. This will allow infrastructure like fscrypt, data
> compression, data checksums, etc to be suported generically, not
> just by individual filesystems that provide a ->submit_io hook.
>
> As for the btrfs needing to slice and dice bios for multiple
> devices? That should be done via a block device ->make_request
> function, not a custom hook in the iomap code.
btrfs differentiates the way how metadata and data is
handled/replicated/stored. We would still need an entry point in the
iomap code to handle the I/O submission.
>
> That's why I don't like this hook - I think hiding data operations
> and/or custom bio manipulations in opaque filesystem callouts is
> completely the wrong approach to be taking. We need to do these
> things in a generic manner so that all filesystems (and block
> devices!) that use the iomap infrastructure can take advantage of
> them, not just one of them.
>
> Quite frankly, I don't care if it takes more time and work up front,
> I'm tired of expedient hacks to merge code quickly repeatedly biting
> us on the arse and wasting far more time sorting out than we would
> have spent getting it right in the first place.
Sure. I am open to ideas. What are you proposing?
--
Goldwyn