My Filesystem is Broken

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi.
My primary (boot) filesystem is broken, due to an interrupted resize operation.
I'm hoping that I can get help either fixing the filesystem or
recovering some of my data, but I'd also like to know why btrfs and
its tools acted the way they did.
I think I've also found a bug in GParted.

Output of uname -a on the recovery medium: Linux ArchUSB
4.18.3-arch1-1-ARCH #1 SMP PREEMPT Sat Aug 18 09:22:54 UTC 2018 x86_64
GNU/Linux
Kernel on the affected system: Liquorix Linux 4.17.14
Output of btrfs --version: btrfs-progs v4.17.1
Relevant dmesg log is attached.

I currently use a single btrfs filesystem (fs A) with subvolumes for
root, home, and var.  It's ~140 GiB in size, with ~130GiB used.
I also have a second btrfs filesystem (fs B) from a previous
installation of Arch Linux.  It's ~40 GiB in size, with less than 30
GiB used.

First of all, how I got into this situation:

Yesterday, I wanted to reclaim some space, so I decided to shrink fs B.
I opened up GParted, and resized it.  I got an error from btrfs, like
"no space left on device".
I was a little confused.  This was seemingly solved by unmounting fs B.
I re-ran the reisze.  Some process was using 100% CPU on one core, and
I didn't see much (if any) I/O activity.
After a few minutes I noticed that GParted was attempting to resize
the mount point of fs B ('/mnt/oldsys' on fs A), even though it wasn't
mounted!
I'm not sure what this does, but I figure that it's not good (even
though fs A shouldn't fit in 32 GiB).
So I press cancel... nothing.  I try force cancel, still no effect.  I
try to kill the resize process, first with SIGTERM, then with
SIGKILL... nothing.
I figure that I have to reboot at this point.  During reboot, systemd
waits really long for a stop job on some user session thing.
Then, strangely, I see output _during shutdown_ about btrfs
_beginning_ to resize fs A (referring to it by its block device,
/dev/sda2)...
I choose to reset my computer.  When I try to boot again (bad idea in
retrospect) systemd takes a long time to "re-mount the root
filesystem".
Start jobs begin to timeout and fail, so I reset my computer again to
boot into a recovery medium.

Part 2, the attempted recovery:

I run btrfsck on sda2, and seeing lots of errors (see
btrfsck.readonly.log) I choose to make an image of the entire block
device.
After that, I attempt a btrfsck --repair (btrfsck.repair.log).
It seems alright until it reaches "Deleting bad dir index" and then
hangs.  IIRC at some point it segfaulted.
Desperately, I try to run another repair... and I encounter a BUG_ON
(btrfsck.repair.2.log).  Ouch.


Well, at this point I'm stuck.  I have no backups.
I've already restored the image of fs A from before any repair attempts.

Thanks in advance,
Jared


Some big files that I couldn't attach:
btrfsck.readonly.log:
https://drive.google.com/open?id=1CZP67uCs7zCyi1CfPv6tt9DxnIfU967u
btrfsck.repair.log:
https://drive.google.com/open?id=1l2Nj8n9CzmxRZznbbIYEMrQc6c5plMDm


P.S. Sorry if this gets sent twice -- Gmail failed to deliver it the first time.
[1/7] checking root items
Fixed 0 roots.
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
Deleting bad dir index [258,96,2] root 18446744073709551608
extent-tree.c:1423: btrfs_inc_extent_ref: BUG_ON `err` triggered, value -5
btrfs check(+0x1e612)[0x5593cc313612]
btrfs check(btrfs_inc_extent_ref+0x166)[0x5593cc3167b6]
btrfs check(+0x1f064)[0x5593cc314064]
btrfs check(__btrfs_cow_block+0x5e5)[0x5593cc309af5]
btrfs check(btrfs_cow_block+0xf9)[0x5593cc309ea9]
btrfs check(btrfs_search_slot+0x36d)[0x5593cc30cd1d]
btrfs check(btrfs_lookup_dir_index+0x58)[0x5593cc31ea58]
btrfs check(+0x5d6ae)[0x5593cc3526ae]
btrfs check(cmd_check+0x3580)[0x5593cc35d460]
btrfs check(main+0x88)[0x5593cc3080b8]
/usr/lib/libc.so.6(__libc_start_main+0xf3)[0x7f95fa1b2223]
btrfs check(_start+0x2e)[0x5593cc3081de]
[ 2137.123004] BTRFS info (device sda2): disk space caching is enabled
[ 2137.123005] BTRFS info (device sda2): has skinny extents
[ 2137.189323] BTRFS info (device sda2): enabling ssd optimizations
[ 3409.093491] traps: btrfsck[962] general protection ip:5572b1e0c9c2 sp:7fffbc926c80 error:0 in btrfs[5572b1dfe000+7a000]
[ 3409.093520] audit: type=1701 audit(1534813029.734:4): auid=1000 uid=0 gid=0 ses=1 pid=962 comm="btrfsck" exe="/usr/bin/btrfs" sig=11 res=1
[ 3563.069359] INFO: task btrfsck:962 blocked for more than 120 seconds.
[ 3563.069814]       Not tainted 4.18.3-arch1-1-ARCH #1
[ 3563.070239] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3563.070666] btrfsck         D    0   962    960 0x80000080
[ 3563.070671] Call Trace:
[ 3563.070682]  ? __schedule+0x29b/0x8b0
[ 3563.070686]  schedule+0x32/0x90
[ 3563.070690]  schedule_preempt_disabled+0x14/0x20
[ 3563.070695]  __mutex_lock.isra.0+0x220/0x530
[ 3563.070700]  ? file_update_time+0x5e/0x130
[ 3563.070705]  pipe_write+0x36/0x3f0
[ 3563.070710]  ? _raw_spin_unlock+0x16/0x30
[ 3563.070714]  ? follow_page_pte+0x3a7/0x5f0
[ 3563.070720]  __vfs_write+0x13f/0x190
[ 3563.070725]  __kernel_write+0x4d/0xf0
[ 3563.070730]  dump_emit+0x91/0xc0
[ 3563.070737]  elf_core_dump+0x130b/0x14a0
[ 3563.070747]  do_coredump+0x650/0xe98
[ 3563.070757]  get_signal+0x29b/0x5d0
[ 3563.070761]  ? general_protection+0x8/0x30
[ 3563.070769]  do_signal+0x36/0x650
[ 3563.070773]  ? _raw_spin_lock+0x13/0x40
[ 3563.070777]  ? _raw_spin_unlock+0x16/0x30
[ 3563.070781]  ? _raw_spin_lock_irqsave+0x25/0x50
[ 3563.070785]  ? _raw_spin_unlock_irqrestore+0x20/0x40
[ 3563.070790]  ? try_to_wake_up+0x231/0x480
[ 3563.070795]  ? general_protection+0x8/0x30
[ 3563.070801]  exit_to_usermode_loop+0xba/0xe0
[ 3563.070805]  prepare_exit_to_usermode+0x64/0x90
[ 3563.070810]  retint_user+0x8/0x8
[ 3563.070815] RIP: 0033:0x5572b1e0c9c2
[ 3563.070816] Code: Bad RIP value.
[ 3563.070826] RSP: 002b:00007fffbc926c80 EFLAGS: 00010202
[ 3563.070829] RAX: 0000d572746076df RBX: 000055736f0ec2c0 RCX: 0000000000000009
[ 3563.070831] RDX: 000000000000002a RSI: 0000000000000009 RDI: 0000000000000000
[ 3563.070833] RBP: 0000000000500000 R08: 00000000fffffffe R09: 00005572b7ce0990
[ 3563.070835] R10: 0000000000000000 R11: 00005572b239eef0 R12: 00005572bbf2d7d0
[ 3563.070837] R13: 00007fffbc926ccf R14: 00000000000000b6 R15: 00005572b7ce0a10
[ 3563.070842] INFO: task systemd-coredum:1020 blocked for more than 120 seconds.
[ 3563.071272]       Not tainted 4.18.3-arch1-1-ARCH #1
[ 3563.071701] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3563.072134] systemd-coredum D    0  1020      1 0x000001a0
[ 3563.072137] Call Trace:
[ 3563.072142]  ? __schedule+0x29b/0x8b0
[ 3563.072147]  ? preempt_count_add+0x68/0xa0
[ 3563.072150]  schedule+0x32/0x90
[ 3563.072156]  wb_wait_for_completion+0x59/0x90
[ 3563.072161]  ? wait_woken+0x80/0x80
[ 3563.072166]  __writeback_inodes_sb_nr+0xaa/0xd0
[ 3563.072172]  try_to_writeback_inodes_sb+0x3c/0x50
[ 3563.072208]  ext4_nonda_switch+0x7b/0x80 [ext4]
[ 3563.072231]  ext4_da_write_begin+0x6f/0x480 [ext4]
[ 3563.072239]  generic_perform_write+0xb7/0x1c0
[ 3563.072242]  ? file_update_time+0x5e/0x130
[ 3563.072246]  __generic_file_write_iter+0xfa/0x1c0
[ 3563.072250]  ? mem_cgroup_event_ratelimit.isra.11+0x36/0xa0
[ 3563.072269]  ext4_file_write_iter+0xc6/0x3e0 [ext4]
[ 3563.072275]  do_iter_readv_writev+0x145/0x1a0
[ 3563.072279]  do_iter_write+0x80/0x190
[ 3563.072283]  iter_file_splice_write+0x255/0x3b0
[ 3563.072289]  __se_sys_splice+0x31b/0x770
[ 3563.072295]  do_syscall_64+0x5b/0x170
[ 3563.072300]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 3563.072303] RIP: 0033:0x7f1de9d12f4b
[ 3563.072304] Code: Bad RIP value.
[ 3563.072310] RSP: 002b:00007ffc41aeb798 EFLAGS: 00000246 ORIG_RAX: 0000000000000113
[ 3563.072314] RAX: ffffffffffffffda RBX: 00000000267ed000 RCX: 00007f1de9d12f4b
[ 3563.072315] RDX: 0000000000000006 RSI: 0000000000000000 RDI: 0000000000000005
[ 3563.072317] RBP: 00007ffc41aeb820 R08: 00000000267ed000 R09: 0000000000000000
[ 3563.072319] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000267ed000
[ 3563.072320] R13: 0000000000002000 R14: 0000000000000000 R15: 0000000000000000
[ 3808.825151] INFO: task systemd-coredum:1020 blocked for more than 120 seconds.
[ 3808.825609]       Not tainted 4.18.3-arch1-1-ARCH #1
[ 3808.826039] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3808.826474] systemd-coredum D    0  1020      1 0x000001a4
[ 3808.826479] Call Trace:
[ 3808.826490]  ? __schedule+0x29b/0x8b0
[ 3808.826495]  ? preempt_count_add+0x68/0xa0
[ 3808.826499]  schedule+0x32/0x90
[ 3808.826507]  wb_wait_for_completion+0x59/0x90
[ 3808.826512]  ? wait_woken+0x80/0x80
[ 3808.826518]  __writeback_inodes_sb_nr+0xaa/0xd0
[ 3808.826523]  try_to_writeback_inodes_sb+0x3c/0x50
[ 3808.826562]  ext4_nonda_switch+0x7b/0x80 [ext4]
[ 3808.826585]  ext4_da_write_begin+0x6f/0x480 [ext4]
[ 3808.826593]  generic_perform_write+0xb7/0x1c0
[ 3808.826598]  __generic_file_write_iter+0xfa/0x1c0
[ 3808.826618]  ext4_file_write_iter+0xc6/0x3e0 [ext4]
[ 3808.826622]  ? __seccomp_filter+0x43/0x490
[ 3808.826628]  __vfs_write+0x13f/0x190
[ 3808.826634]  vfs_write+0xa9/0x190
[ 3808.826638]  ksys_write+0x4f/0xb0
[ 3808.826644]  do_syscall_64+0x5b/0x170
[ 3808.826649]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 3808.826653] RIP: 0033:0x7f1de9dec0b5
[ 3808.826654] Code: Bad RIP value.
[ 3808.826664] RSP: 002b:00007ffc41aeb6d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 3808.826668] RAX: ffffffffffffffda RBX: 0000000000032bf9 RCX: 00007f1de9dec0b5
[ 3808.826670] RDX: 0000000000032bf9 RSI: 00007f1de5a0a010 RDI: 0000000000000007
[ 3808.826672] RBP: 00007f1de5a0a010 R08: 00007f1d659a9000 R09: 00007f1da4b28ff4
[ 3808.826673] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000007
[ 3808.826675] R13: 0000000000000000 R14: ffffffffffffffff R15: 000000003f180000
[ 3931.703090] INFO: task systemd-coredum:1020 blocked for more than 120 seconds.
[ 3931.703551]       Not tainted 4.18.3-arch1-1-ARCH #1
[ 3931.703989] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3931.704429] systemd-coredum D    0  1020      1 0x000001a4
[ 3931.704434] Call Trace:
[ 3931.704445]  ? __schedule+0x29b/0x8b0
[ 3931.704450]  ? preempt_count_add+0x68/0xa0
[ 3931.704454]  schedule+0x32/0x90
[ 3931.704460]  wb_wait_for_completion+0x59/0x90
[ 3931.704465]  ? wait_woken+0x80/0x80
[ 3931.704470]  __writeback_inodes_sb_nr+0xaa/0xd0
[ 3931.704476]  try_to_writeback_inodes_sb+0x3c/0x50
[ 3931.704514]  ext4_nonda_switch+0x7b/0x80 [ext4]
[ 3931.704537]  ext4_da_write_begin+0x6f/0x480 [ext4]
[ 3931.704545]  generic_perform_write+0xb7/0x1c0
[ 3931.704550]  __generic_file_write_iter+0xfa/0x1c0
[ 3931.704569]  ext4_file_write_iter+0xc6/0x3e0 [ext4]
[ 3931.704574]  ? __seccomp_filter+0x43/0x490
[ 3931.704580]  __vfs_write+0x13f/0x190
[ 3931.704586]  vfs_write+0xa9/0x190
[ 3931.704590]  ksys_write+0x4f/0xb0
[ 3931.704596]  do_syscall_64+0x5b/0x170
[ 3931.704602]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 3931.704606] RIP: 0033:0x7f1de9dec0b5
[ 3931.704607] Code: Bad RIP value.
[ 3931.704617] RSP: 002b:00007ffc41aeb6d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 3931.704621] RAX: ffffffffffffffda RBX: 0000000000032bf9 RCX: 00007f1de9dec0b5
[ 3931.704622] RDX: 0000000000032bf9 RSI: 00007f1de5a0a010 RDI: 0000000000000007
[ 3931.704624] RBP: 00007f1de5a0a010 R08: 00007f1d659a9000 R09: 00007f1da4b28ff4
[ 3931.704626] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000007
[ 3931.704628] R13: 0000000000000000 R14: ffffffffffffffff R15: 000000003f180000
[ 3982.718116] audit: type=1701 audit(1534813603.367:5): auid=1000 uid=0 gid=0 ses=1 pid=960 comm="sudo" exe="/usr/bin/sudo" sig=11 res=1

[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux