Re: [PATCH V2] Btrfs: Full direct I/O and AIO read implementation.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 15, 2010 at 02:18:15PM -0500, jim owens wrote:
> Chris Mason wrote:
> >Thanks to both Jim and Josef for their time on this one.  More comments
> >below.
> >
> >On Sat, Feb 13, 2010 at 08:30:07PM -0500, jim owens wrote:
> >>Josef Bacik wrote:
> >>>On Wed, Feb 10, 2010 at 01:53:50PM -0500, jim owens wrote:
> >>>>+	/* traditional 512-byte device sector alignment is the
> >>The problem that I did not explain in my comment is that other fs's
> >>get the blocksize from 1 device bdi, which usually will be 512.  But
> >>the btrfs bdi covers all volumes and gives the btrf logical 4096 size.
> >>If we do what is done in direct-io.c then we only allow access on 4k
> >>(or in the future larger logical) boundaries.
> >
> >We can record the smallest alignment while scanning the devices and just
> >use that.  In general 4K is good enough, although some other database
> >company might be more worried about smaller numbers.
> 
> So you are saying at mount time we will set a new field in a
> btrfs data structure that we can use here instead of the hard-code.
> With today's disks this will usually be 512.

Yes please.

[ use i_mutex for reads? ]

> >
> >But, we're making the performance of the common case (parallel reads)
> >dramatically slower to close a small race.  Reads and writes are allowed
> >to race, and the results are allowed to be undefined.  The only
> >requirement is that we not return garbage (zeros, or good data only).
> 
> I agree with both why we don't want the mutex and that we don't care
> about the read returning old or new data.
> 
> But in spite of what you said above, I found lock_extent() was
> not enough and I needed the i_mutex.
> 
> The problem I had was that if btrfs_wait_ordered_range() does not
> run on the new data, that data can still be on disk but the btree
> checksum is wrong. I need the checksum forced into the btree so
> I don't fail the comparison.
> 
> Even an overwrite when nodatacow is set will cause the same problem.
> 
> I can try to code something like btrfs_page_mkwrite() does and see
> if that works, though I really question if that will be a dramatic
> improvement in parallel read because now we have to take the lock
> and search the ordered tree.

But, we already need the code that btrfs_page_mkwrite uses.  It should
be enough to wait for the ordered extents and have the extent range
locked.

> To me this will be a net loss of
> performance on files that are not being modified as the i_mutex
> does not slow down parallel readers except when the area they
> are reading needs flushed. 

The cost of i_mutex on parallel readers + high speed devices can be
surprising.

> And if the file is being modified in
> the spot we are reading, without i_mutex we may need to flush
> it multiple times, endlessly.  And unlike btrfs_page_mkwrite()
> this code is not doing just 1 page.

If the file is being modified by buffered writes, they get what they
deserve.  That's not what O_DIRECT is for.

> 
> Or am I getting something wrong?
> 
> >I think the temp buffer setup is the best place to start for
> >O_DIRECT + compressed.
> 
> Not sure what you mean by that.

I meant I think your way is right ;)

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux