Re: Kernel crash related to LZO compression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2018/10/25 下午11:56, Dmitry Katsubo wrote:
> Dear btrfs community,
> 
> My excuses for the dumps for rather old kernel (4.9.25), nevertheless I
> wonder
> about your opinion about the below reported kernel crashes.
> 
> As I could understand the situation (correct me if I am wrong), it happened
> that some data block became corrupted which resulted the following
> kernel trace
> during the boot:
> 
> kernel BUG at /build/linux-fB36Cv/linux-4.9.25/fs/btrfs/extent_io.c:2318!
> invalid opcode: 0000 [#1] SMP
> Call Trace:
>  [<f8c63739>] ? end_bio_extent_readpage+0x4e9/0x680 [btrfs]
>  [<f8c951eb>] ? end_compressed_bio_read+0x3b/0x2d0 [btrfs]
>  [<f8c771de>] ? btrfs_scrubparity_helper+0xce/0x2d0 [btrfs]
>  [<de07ebb1>] ? process_one_work+0x141/0x380
>  [<de07ee31>] ? worker_thread+0x41/0x460
>  [<de0840e4>] ? kthread+0xb4/0xd0
>  [<de07edf0>] ? process_one_work+0x380/0x380
>  [<de084030>] ? kthread_park+0x50/0x50
>  [<de5aae03>] ? ret_from_fork+0x1b/0x28
> 
> The problematic file turned out to be the one used by systemd-journald
> /var/log/journal/c496cea41ebc4700a0dfaabf64a21be4/system.journal
> which was trying to read it (or append to it) during the boot and that was
> causing the system crash (see attached bootN_dmesg.txt).
> 
> I've rebooted in safe mode and tried to copy the data from this
> partition to
> another location using btrfs-restore, however kernel was crashing as
> well with
> a bit different symphom (see attached copyN_dmesg.txt):
> 
> Call Trace:
>  [<f8b4c760>] ? lzo_decompress_biovec+0x1b0/0x2b0 [btrfs]
>  [<d71a8828>] ? vmalloc+0x38/0x40
>  [<f8b4d415>] ? end_compressed_bio_read+0x265/0x2d0 [btrfs]
>  [<f8b2f1de>] ? btrfs_scrubparity_helper+0xce/0x2d0 [btrfs]
>  [<d707ebb1>] ? process_one_work+0x141/0x380
>  [<d707ee31>] ? worker_thread+0x41/0x460
>  [<d70840e4>] ? kthread+0xb4/0xd0
>  [<d75aae03>] ? ret_from_fork+0x1b/0x28
> 
> Just to keep away from the problem, I've removed this file and also removed
> "compress=lzo" mount option.
> 
> Are there any updates / fixes done in that area? Is lzo option safe to use?

Yes, we have commits to harden lzo decompress code in v4.18:

de885e3ee281a88f52283c7e8994e762e3a5f6bd btrfs: lzo: Harden inline lzo
compressed extent decompression
314bfa473b6b6d3efe68011899bd718b349f29d7 btrfs: lzo: Add header length
check to avoid potential out-of-bounds acc

And for the root cause, it's compressed data without csum, then scrub
could make it corrupted.

It's also fixed in v4.18:

665d4953cde6d9e75c62a07ec8f4f8fd7d396ade btrfs: scrub: Don't use inode
page cache in scrub_handle_errored_block()
ac0b4145d662a3b9e34085dea460fb06ede9b69b btrfs: scrub: Don't use inode
pages for device replace

Thanks,
Qu

> 
> P.S. Perhaps relative issue is in "Warnings" section:
> 
> https://wiki.debian.org/Btrfs#Warnings /
> https://www.spinics.net/lists/linux-btrfs/msg56563.html
> 

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux