Hi, I have a 350 GB btrfs filesystem in which I am storing backups of virtual machine disk images. These are rsynced periodically from the VM host to a "current" subvolume, followed by a snapshot operation to a dated subvolume. One disk image is about 50 GB in size, as reported by ls -l and du. However, the qgroup assigned to the "current" subvolume reports a "refr" size of 75 GB. I have performed a "btrfs quota rescan" to make sure the quota is up-to-date. I have tried defragmenting the file, but this did not significantly help. Is there an explanation for the discrepancy between the logical file size and the data used in btrfs to store it? I have one theory, which is that when the updated VM image is synced into the current subvolume, the changes affect only a small part of each data storage node (I'm not sure if I'm using the terminology "node" correctly here), but each node needs to be duplicated due to the COW nature of the filesystem, and the fact that the nodes are shared with the existing snapshots, so they cannot be rewritten to be more efficient. This means that most of the data in such a node is actually duplicated, even though it only counts once toward the logical size of the file. I do not know how to determine the node size of my filesystem, but as far as I can tell from searching, the node size is never more than 65 K. It seems to me unlikely that such a small node size could cause the problems I am seeing, but I suppose it's not impossible, especially because this virtual machine disk image hosts a number of git repositories, typically containing large numbers of small files, which have undergone significant churn in the past. If this were the problem, would deduplication help, or does it operate only at the level of nodes? I am using Linux version 3.16.0-38-generic, Ubuntu 14.04.2 LTS. This is from August 2014. I know that it is preferable to use the latest kernel for btrfs; Ubuntu provides up to 3.19.0-18, and I would consider upgrading if this is likely to help the problem. What is the most up-to-date description of how btrfs stores data? I have found this, https://oss.oracle.com/~mason/btrfs/btrfs-design.html, for example. -- Ian Hinder http://members.aei.mpg.de/ianhin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
