Re: cp -a leaves some compressed data.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, May 16, 2020 at 9:01 PM A L <mail@xxxxxxxxxxxxxx> wrote:
>
>
> On 2020-05-16 19:18, Filipe Manana wrote:
> > On Sat, May 16, 2020 at 5:51 PM A L <mail@xxxxxxxxxxxxxx> wrote:
> >> Dear all,
> >>
> >> I did some testing on copying files with the +c (compression) xattrs set.
> >>
> >> As far as I can tell, 'cp - a' only sets any xattrs after copying the data. This means that a compressed file should end up without compression, but still with the +c xattr set. However this is not entirely true. Some small amount of data is still getting compressed.
> >>
> >> I would like to understand why.
> > As discussed on the mailing list:
> >
> > cp copies the xattr only after copying the file data. Since the data
> > is written to the destination using buffered IO, it is possible that
> > while copying the data the system flushes dirty pages for whatever
> > reason (due to memory pressure, someone called sync(2), etc) - this
> > data will not be compressed since the file doesn't have yet the
> > compression xattr. If the remaining data is flushed after cp finishes,
> > then that data can end up compressed, since the file has the
> > compression xattr at that point. Typically for small files, all the
> > data ends up getting flushed after cp finishes, so we don't see any
> > surprising behaviour.
> >
> > I'll look into changing 'cp''s behaviour to copy xattrs before file
> > data next week, unless you or someone else is interested in doing it.
> >
> > Thanks.
> >
> Based on what you say, the file operations are happening asynchronous in
> the background, rather than synchronous.

What I said is that while in the middle of copying, dirty pages might
be flushed for some reason, in which case the data won't be compressed
since the destination file doesn't have the xattr set yet. The
remaining data will be flushed after the xattr was set, so it will end
up getting compressed.

> I guess 'cp' and other tools
> like it should issue a 'fsync' call between setting the xattrs and
> writing data?

I don't understand what you are trying to say. Are you suggesting the
fsync would help the issue you described (I don't see how), with the
file ending up having compressed and uncompressed extents, or is that
for some other issue you are thinking about?

> Is this specific to Btrfs, or is it a Linux design choice?

Can't tell since I don't understand what is the problem.

>
> Also, thanks for looking into changing cp to do the xattrs before
> writing data. I had also asked about this on the coreutils mailing list:
> https://lists.gnu.org/archive/html/coreutils/2020-05/msg00011.html

Great, thanks. The coreutils folks will deal with it the best way.

>
> Thanks



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux