On 9/19/19 9:44 AM, Jann Horn wrote:
> On Thu, Sep 19, 2019 at 8:54 AM Omar Sandoval <osandov@xxxxxxxxxxx> wrote:
>> Btrfs can transparently compress data written by the user. However, we'd
>> like to add an interface to write pre-compressed data directly to the
>> filesystem. This adds support for so-called "encoded writes" via
>> pwritev2().
>>
>> A new RWF_ENCODED flags indicates that a write is "encoded". If this
>> flag is set, iov[0].iov_base points to a struct encoded_iov which
>> contains metadata about the write: namely, the compression algorithm and
>> the unencoded (i.e., decompressed) length of the extent. iov[0].iov_len
>> must be set to sizeof(struct encoded_iov), which can be used to extend
>> the interface in the future. The remaining iovecs contain the encoded
>> extent.
>>
>> A similar interface for reading encoded data can be added to preadv2()
>> in the future.
>>
>> Filesystems must indicate that they support encoded writes by setting
>> FMODE_ENCODED_IO in ->file_open().
> [...]
>> +int import_encoded_write(struct kiocb *iocb, struct encoded_iov *encoded,
>> + struct iov_iter *from)
>> +{
>> + if (iov_iter_single_seg_count(from) != sizeof(*encoded))
>> + return -EINVAL;
>> + if (copy_from_iter(encoded, sizeof(*encoded), from) != sizeof(*encoded))
>> + return -EFAULT;
>> + if (encoded->compression == ENCODED_IOV_COMPRESSION_NONE &&
>> + encoded->encryption == ENCODED_IOV_ENCRYPTION_NONE) {
>> + iocb->ki_flags &= ~IOCB_ENCODED;
>> + return 0;
>> + }
>> + if (encoded->compression > ENCODED_IOV_COMPRESSION_TYPES ||
>> + encoded->encryption > ENCODED_IOV_ENCRYPTION_TYPES)
>> + return -EINVAL;
>> + if (!capable(CAP_SYS_ADMIN))
>> + return -EPERM;
>
> How does this capable() check interact with io_uring? Without having
> looked at this in detail, I suspect that when an encoded write is
> requested through io_uring, the capable() check might be executed on
> something like a workqueue worker thread, which is probably running
> with a full capability set.
If we can hit -EAGAIN before doing the import in io_uring, then yes,
this will probably bypass the check as it'll only happen from the
worker.
--
Jens Axboe