On Sat, May 16, 2020 at 9:01 PM A L <mail@xxxxxxxxxxxxxx> wrote: > > > On 2020-05-16 19:18, Filipe Manana wrote: > > On Sat, May 16, 2020 at 5:51 PM A L <mail@xxxxxxxxxxxxxx> wrote: > >> Dear all, > >> > >> I did some testing on copying files with the +c (compression) xattrs set. > >> > >> As far as I can tell, 'cp - a' only sets any xattrs after copying the data. This means that a compressed file should end up without compression, but still with the +c xattr set. However this is not entirely true. Some small amount of data is still getting compressed. > >> > >> I would like to understand why. > > As discussed on the mailing list: > > > > cp copies the xattr only after copying the file data. Since the data > > is written to the destination using buffered IO, it is possible that > > while copying the data the system flushes dirty pages for whatever > > reason (due to memory pressure, someone called sync(2), etc) - this > > data will not be compressed since the file doesn't have yet the > > compression xattr. If the remaining data is flushed after cp finishes, > > then that data can end up compressed, since the file has the > > compression xattr at that point. Typically for small files, all the > > data ends up getting flushed after cp finishes, so we don't see any > > surprising behaviour. > > > > I'll look into changing 'cp''s behaviour to copy xattrs before file > > data next week, unless you or someone else is interested in doing it. > > > > Thanks. > > > Based on what you say, the file operations are happening asynchronous in > the background, rather than synchronous. What I said is that while in the middle of copying, dirty pages might be flushed for some reason, in which case the data won't be compressed since the destination file doesn't have the xattr set yet. The remaining data will be flushed after the xattr was set, so it will end up getting compressed. > I guess 'cp' and other tools > like it should issue a 'fsync' call between setting the xattrs and > writing data? I don't understand what you are trying to say. Are you suggesting the fsync would help the issue you described (I don't see how), with the file ending up having compressed and uncompressed extents, or is that for some other issue you are thinking about? > Is this specific to Btrfs, or is it a Linux design choice? Can't tell since I don't understand what is the problem. > > Also, thanks for looking into changing cp to do the xattrs before > writing data. I had also asked about this on the coreutils mailing list: > https://lists.gnu.org/archive/html/coreutils/2020-05/msg00011.html Great, thanks. The coreutils folks will deal with it the best way. > > Thanks -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.”
