On Thu, May 28, 2020 at 7:38 PM Goldwyn Rodrigues <rgoldwyn@xxxxxxx> wrote:
>
> On 17:45 28/05, Filipe Manana wrote:
> > On Thu, May 28, 2020 at 5:34 PM Goldwyn Rodrigues <rgoldwyn@xxxxxxx> wrote:
> > > > And who locked the extent range before?
> > >
> > > This is usually locked by a previous buffered write or read.
> >
> > A previous buffered write/read that has already finished or is still
> > in progress?
> >
> > Because if it has finished we're not supposed to have the file range
> > locked anymore.
>
> In progress. Mixing buffered I/O with direct writes.
>
> >
> > >
> > > >
> > > > That seems alarming to me, specially if it's a direct IO write failing
> > > > to invalidate the page cache, since a subsequent buffered read could
> > > > get stale data (what's in the page cache), and not what the direct IO
> > > > write wrote.
> > > >
> > > > Can you elaborate more on all those details?
> > >
> > > The origin of the message is when iomap_dio_rw() tries to invalidate the
> > > inode pages, but fails and calls dio_warn_stale_pagecache().
> > >
> > > In the vanilla code, generic_file_direct_write() aborts direct writes
> > > and returns 0 so that it may fallback to buffered I/O. Perhaps this
> > > should be changed in iomap_dio_rw() as well. I will write a patch to
> > > accomodate that.
> >
> > On vanilla we have no problems of mixing buffered and direct
> > operations as long as they are done sequentially at least.
> > And even if done concurrently we take several measures to ensure that
> > are no surprises (locking ranges, waiting for any ordered extents in
> > progress, etc).
>
> Yes, it is because of the code in generic_file_direct_write(). Anyways,
> I did some tests with the following patch, and it seems to work. I will
> send a formal patch to so that it gets incorporated in iomap sequence as
> well.
>
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index e4addfc58107..215315be6233 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -483,9 +483,15 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> */
> ret = invalidate_inode_pages2_range(mapping,
> pos >> PAGE_SHIFT, end >> PAGE_SHIFT);
> - if (ret)
> - dio_warn_stale_pagecache(iocb->ki_filp);
> - ret = 0;
> + /*
> + * If a page can not be invalidated, return 0 to fall back
> + * to buffered write.
> + */
> + if (ret) {
> + if (ret == -EBUSY)
> + ret = 0;
> + goto out_free_dio;
> + }
>
> if (iov_iter_rw(iter) == WRITE && !wait_for_completion &&
> !inode->i_sb->s_dio_done_wq) {
>
>
Thanks. As I just replied on another thread for that patch, we
actually have a regression.
There's more than the annoying warning in dmesg, it also sets -EIO on
the inode's mapping and makes future fsyncs return that error despite
the fact that no actual errors or corruptions happened:
https://patchwork.kernel.org/patch/11576677/
>
> --
> Goldwyn
--
Filipe David Manana,
“Whether you think you can, or you think you can't — you're right.”