On 2020/7/5 下午6:30, Thilo-Alexander Ginkel wrote: > On Sun, Jul 5, 2020 at 11:53 AM Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote: >> How producible is this? > > I did some log analysis: The problem started showing up on two of > three servers starting July 3rd, 2020. This coincides with an applied > Ubuntu Linux kernel update to 4.15.0-109-generic whose changelog shows > plenty of btrfs changes: > https://launchpad.net/ubuntu/+source/linux/4.15.0-109.110 So it backported all these restrict self check of recent kernels. That's great to expose any unexpected metadata. Although sometimes backport itself may introduce new bugs (very rare), especially for heavy backported kernels. So if it's possible, try upstream kernel can also be an alternative to test if it's really something wrong. Another factor involved is btrfs-progs version, which normally gets less backports, while upstream normally have more strict checks overall. So trying upstream btrfs-check would also be a good idea if possible. > > Server #2 (still online) shows 16 error messages in its log since > 2020-07-03 whereas server #3 shows 310 error messages. Then it shouldn't be a hardware problem unless all servers have the same problem. In such cases, I would recommend to try upstream kernels first, especially when the heavily backported kernels are involved. If you can reproduce it with upstream kernel, then I strongly recommend to use that mentioned diff to provide more info to debug, as it would be a false alert. > > On thing special about server #3 is that its btrfs file system has a > huge metadata section (probably due to it hosting many [~ 50 Mio] > small files), which doesn't seem too healthy: > > # btrfs filesystem usage /mnt > Overall: > Device size: 476.30GiB > Device allocated: 372.02GiB > Device unallocated: 104.28GiB > Device missing: 0.00B > Used: 272.16GiB > Free (estimated): 194.49GiB (min: 194.49GiB) > Data ratio: 1.00 > Metadata ratio: 1.00 > Global reserve: 512.00MiB (used: 0.00B) > > Data,single: Size:284.01GiB, Used:193.80GiB > /dev/mapper/luks 284.01GiB > > Metadata,single: Size:88.01GiB, Used:78.36GiB > /dev/mapper/luks 88.01GiB In fact, your metadata is not that unhealthy. > > System,single: Size:4.00MiB, Used:80.00KiB > /dev/mapper/luks 4.00MiB > > Unallocated: > /dev/mapper/luks 104.28GiB And there are plenty unallocated space, so your fs looks pretty healthy instead. Thanks, Qu > >> If it still shows the same symptom after verifying the RAM, would you >> please apply this small debug diff on your kernel? > > I'll see what I can do. > > Thanks, > Thilo >
Attachment:
signature.asc
Description: OpenPGP digital signature
