On 2018/10/25 下午11:56, Dmitry Katsubo wrote: > Dear btrfs community, > > My excuses for the dumps for rather old kernel (4.9.25), nevertheless I > wonder > about your opinion about the below reported kernel crashes. > > As I could understand the situation (correct me if I am wrong), it happened > that some data block became corrupted which resulted the following > kernel trace > during the boot: > > kernel BUG at /build/linux-fB36Cv/linux-4.9.25/fs/btrfs/extent_io.c:2318! > invalid opcode: 0000 [#1] SMP > Call Trace: > [<f8c63739>] ? end_bio_extent_readpage+0x4e9/0x680 [btrfs] > [<f8c951eb>] ? end_compressed_bio_read+0x3b/0x2d0 [btrfs] > [<f8c771de>] ? btrfs_scrubparity_helper+0xce/0x2d0 [btrfs] > [<de07ebb1>] ? process_one_work+0x141/0x380 > [<de07ee31>] ? worker_thread+0x41/0x460 > [<de0840e4>] ? kthread+0xb4/0xd0 > [<de07edf0>] ? process_one_work+0x380/0x380 > [<de084030>] ? kthread_park+0x50/0x50 > [<de5aae03>] ? ret_from_fork+0x1b/0x28 > > The problematic file turned out to be the one used by systemd-journald > /var/log/journal/c496cea41ebc4700a0dfaabf64a21be4/system.journal > which was trying to read it (or append to it) during the boot and that was > causing the system crash (see attached bootN_dmesg.txt). > > I've rebooted in safe mode and tried to copy the data from this > partition to > another location using btrfs-restore, however kernel was crashing as > well with > a bit different symphom (see attached copyN_dmesg.txt): > > Call Trace: > [<f8b4c760>] ? lzo_decompress_biovec+0x1b0/0x2b0 [btrfs] > [<d71a8828>] ? vmalloc+0x38/0x40 > [<f8b4d415>] ? end_compressed_bio_read+0x265/0x2d0 [btrfs] > [<f8b2f1de>] ? btrfs_scrubparity_helper+0xce/0x2d0 [btrfs] > [<d707ebb1>] ? process_one_work+0x141/0x380 > [<d707ee31>] ? worker_thread+0x41/0x460 > [<d70840e4>] ? kthread+0xb4/0xd0 > [<d75aae03>] ? ret_from_fork+0x1b/0x28 > > Just to keep away from the problem, I've removed this file and also removed > "compress=lzo" mount option. > > Are there any updates / fixes done in that area? Is lzo option safe to use? Yes, we have commits to harden lzo decompress code in v4.18: de885e3ee281a88f52283c7e8994e762e3a5f6bd btrfs: lzo: Harden inline lzo compressed extent decompression 314bfa473b6b6d3efe68011899bd718b349f29d7 btrfs: lzo: Add header length check to avoid potential out-of-bounds acc And for the root cause, it's compressed data without csum, then scrub could make it corrupted. It's also fixed in v4.18: 665d4953cde6d9e75c62a07ec8f4f8fd7d396ade btrfs: scrub: Don't use inode page cache in scrub_handle_errored_block() ac0b4145d662a3b9e34085dea460fb06ede9b69b btrfs: scrub: Don't use inode pages for device replace Thanks, Qu > > P.S. Perhaps relative issue is in "Warnings" section: > > https://wiki.debian.org/Btrfs#Warnings / > https://www.spinics.net/lists/linux-btrfs/msg56563.html >
Attachment:
signature.asc
Description: OpenPGP digital signature
