Re: [PATCH] Btrfs: deal with existing encompassing extent map in btrfs_get_extent()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 10, 2016 at 12:09:06PM -0800, Omar Sandoval wrote:
> On Thu, Nov 10, 2016 at 12:01:20PM -0800, Liu Bo wrote:
> > On Wed, Nov 09, 2016 at 03:26:50PM -0800, Omar Sandoval wrote:
> > > From: Omar Sandoval <osandov@xxxxxx>
> > > 
> > > My QEMU VM was seeing inexplicable I/O errors that I tracked down to
> > > errors coming from the qcow2 virtual drive in the host system. The qcow2
> > > file is a nocow file on my Btrfs drive, which QEMU opens with O_DIRECT.
> > > Every once in awhile, pread() or pwrite() would return EEXIST, which
> > > makes no sense. This turned out to be a bug in btrfs_get_extent().
> > > 
> > > Commit 8dff9c853410 ("Btrfs: deal with duplciates during extent_map
> > > insertion in btrfs_get_extent") fixed a case in btrfs_get_extent() where
> > > two threads race on adding the same extent map to an inode's extent map
> > > tree. However, if the added em is merged with an adjacent em in the
> > > extent tree, then we'll end up with an existing extent that is not
> > > identical to but instead encompasses the extent we tried to add. When we
> > > call merge_extent_mapping() to find the nonoverlapping part of the new
> > > em, the arithmetic overflows because there is no such thing. We then end
> > > up trying to add a bogus em to the em_tree, which results in a EEXIST
> > > that can bubble all the way up to userspace.
> > 
> > I don't get how this could happen(even after reading Commit
> > 8dff9c853410), btrfs_get_extent in direct_IO is protected by
> > lock_extent_direct, the assumption is that a racy thread should be
> > blocked by lock_extent_direct and when it gets the lock, it finds the
> > just-inserted em when going into btrfs_get_extent if its offset is
> > within [em->start, extent_map_end(em)].
> > 
> > I think we may also need to figure out why the above doesn't work as
> > expected besides fixing another special case.
> > 
> > Thanks,
> > 
> > -liubo
> 
> lock_extent_direct() only protects the range you're doing I/O into, not
> the entire extent. If two threads are doing two non-overlapping reads in
> the same extent, then you can get this race.

More concretely, assume the extent tree on disk has:

+-------------------------+-------------------------------+
|start=0,len=8192,bytenr=0|start=8192,len=8192,bytenr=8192|
+-------------------------+-------------------------------+

And the extent map tree in memory has a single em cached for the second
extent {start=8192, len=8192, bytenr=8192}. Then, two threads try do do
direct I/O reads:

Thread 1                               | Thread 2
---------------------------------------+-------------------------------
pread(offset=0, nbyte=4096)            | pread(offset=4096, nbyte=4096)
lock_extent_direct(start=0, end=4095)  | lock_extent_direct(start=4096, end=8191)
btrfs_get_extent(start=0, len=4096)    | btrfs_get_extent(start=4096, len4096)
  lookup_extent_mapping() = NULL       |   lookup_extent_mapping() = NULL
  reads extent from B-tree             |   reads extent from B-tree
                                       |   write_lock(&em_tree->lock)
				       |   add_extent_mapping(start=0, len=8192, bytenr=0)
				       |     try_merge_map()
				       |     em_tree now has {start=0, len=16384, bytenr=0}
				       |   write_unlock(&em_tree->lock)
write_lock(&em_tree->lock)             |
add_extent_mapping(start=0, len=8192,  |
                   bytenr=0) = -EEXIST |
search_extent_mapping() = {start=0,    |
                           len=16384,  |
			   bytenr=0}   |
merge_extent_mapping() does bogus math |
and overflows, returns EEXIST          |


-- 
Omar
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux