Thanks. Looks like it is a btrfs bug - so a big "hello" to linux-btrfs :-)
The symptom is that iozone on btrfs on md/raid10 can result in
[ 919.893454] md/raid10:md0: make_request bug: can't convert block across chunks or bigger than 256k 6653500160 256
[ 919.893465] btrfs: bdev /dev/mapper/vg0-test errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
i.e. RAID10 has a 256K chunk size, but is getting 256K requests which overlap
two chunks - the last half of one chunk and the first half of the next.
That isn't allowed and raid10_mergeable_bvec, called by bio_add_page, should
prevent it.
However btrfs_map_bio() sets ->bi_sector to a new value without verifying
that the resulting bio is still acceptable - which it isn't.
The core problem is that you cannot build a bio for one location, then use it
freely at another location.
md/raid1 handles this by checking each addition to a bio against all the
possible location that it might read/write it. Maybe btrfs could do the
same.
Alternately we could work with Kent Overstreet (of bcache fame) to remove the
restriction that the fs must make the bio compatible with the device -
instead requiring the device to split bios when needed, and making it easy to
do that (currently it is not easy).
And there are probably other alternative.