[Re-send, hit reply instead of reply-all by mistake. Please CC me, I'm not on the list.] Good morning & thank you. Am So., 20. Okt. 2019 um 02:38 Uhr schrieb Qu Wenruo <quwenruo.btrfs@xxxxxxx>: > It looks like you're using eGPU and the thunderbolt 3 connection disconnect? > That would cause a kernel panic/hang or whatever. No, it's a Radeon VII in a Gigabyte X570 Aorus Master. The board has PCIe 4, otherwise nothing exotic. > > [...] > > BTRFS error [...]: bad tree block start, want 284041084928 have 0 > > BTRFS error [...]: failed to read block groups: -5 > > BTRFS error [...]: open_ctree failed ["big number" filled in above] > This means some tree blocks didn't reach disk or just got wiped out. > Are you using discard mount option? Not to my knowledge. As in, I didn't set "discard", as far as I can remember it didn't show up in mount output, but it's possible it's on by default. > > running btrfs check gives: > > checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000 > > checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000 > > bytenr mismatch, want=284041084928, have=0 > > ERROR: cannot open filesystem. ["big number" and "8-digit hex" filled in above] > Again, some old tree blocks got wiped out. > BTW, you don't need to wipe the numbers, sometimes it help developer to find some corner problem. I was just being lazy, sorry about that. > If it's the only problem, you can try this kernel branch to at least do > a RO mount: > https://github.com/adam900710/linux/tree/rescue_options > > Then mount the fs with "rescue=skipbg,ro" option. > If the bad tree block is the only problem, it should be able to mount it. > > If that mount succeeded, and you can access all files, then it means > only extent tree is corrupted, then you can try btrfs check > --init-extent-tree, there are some reports of --init-extent-tree fixed > the problem. You wouldn't happen to know of a bootable rescue image that has this? The affected machine obviously doesn't boot, getting the NVMe out requires dismantling the CPU cooler, and TBH, I haven't built a kernel in ~15 years. > About the cause, either btrfs didn't write some tree blocks correctly or > the NVMe doesn't implement FUA/FLUSH correctly (which I don't believe is > the case). > > So it's recommended to update the kernel to 5.3 kernel. FWIW, it's a Samsung 970 Evo Plus. TBH, I didn't expect to lose more than the last couple minutes of writes in such a crash, certainly not an unmountable filesystem. So I'd love to know what caused this so I can avoid it in future. But first things first, have to get this thing up & running again ... Cheers, Christian Am So., 20. Okt. 2019 um 02:38 Uhr schrieb Qu Wenruo <quwenruo.btrfs@xxxxxxx>: > > > > On 2019/10/20 上午6:34, Christian Pernegger wrote: > > [Please CC me, I'm not on the list.] > > > > Hello, > > > > I'm afraid I could use some help. > > > > The affected machine froze during a game, was entirely unresponsive > > locally, though ssh still worked. For completeness' sake, dmesg had: > > [110592.128512] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 > > timeout, signaled seq=3404070, emitted seq=3404071 > > [110592.128545] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process > > information: process Xorg pid 1191 thread Xorg:cs0 pid 1204 > > [110592.128549] amdgpu 0000:0c:00.0: GPU reset begin! > > [110592.138530] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx > > timeout, signaled seq=13149116, emitted seq=13149118 > > [110592.138577] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process > > information: process Overcooked.exe pid 4830 thread dxvk-submit pid > > 4856 > > [110592.138579] amdgpu 0000:0c:00.0: GPU reset begin! > > It looks like you're using eGPU and the thunderbolt 3 connection disconnect? > That would cause a kernel panic/hang or whatever. > > > > > Oh well, I thought, and "shutdown -h now" it. That quit my ssh session > > and locked me out, but otherwise didn't take, no reboot, still frozen. > > Alt-SysRq-REISUB it was. That did it. > > > > Only now all I get is a rescue shell, the pertinent messages look to > > be [everything is copied off the screen by hand]: > > [...] > > BTRFS info [...]: disk space caching is enabled > > BTRFS info [...]: has skinny extents > > BTRFS error [...]: bad tree block start, want [big number] have 0 > > BTRFS error [...]: failed to read block groups: -5 > > BTRFS error [...]: open_ctree failed > > This means some tree blocks didn't reach disk or just got wiped out. > > Are you using discard mount option? > > > > > Mounting with -o ro,usebackuproot doesn't change anything. > > > > running btrfs check gives: > > checksum verify failed on [same big number] found [8 digits hex] wanted 00000000 > > checksum verify failed on [same big number] found [8 digits hex] wanted 00000000 > > Again, some old tree blocks got wiped out. > > BTW, you don't need to wipe the numbers, sometimes it help developer to > find some corner problem. > > > bytenr mismatch, want=[same big number], have=0 > > ERROR: cannot open filesystem. > > > > That's all I've got, I'd really appreciate some help. There's hourly > > snapshots courtesy of Timeshift, though I have a feeling those won't > > help ... > > If it's the only problem, you can try this kernel branch to at least do > a RO mount: > https://github.com/adam900710/linux/tree/rescue_options > > Then mount the fs with "rescue=skipbg,ro" option. > If the bad tree block is the only problem, it should be able to mount it. > > If that mount succeeded, and you can access all files, then it means > only extent tree is corrupted, then you can try btrfs check > --init-extent-tree, there are some reports of --init-extent-tree fixed > the problem. > > > > > Oh, it's a recent Linux Mint 19.2 install, default layout (@, @home), > > Timeshift enabled; on a single device (NVMe). HWE kernel (Kernel > > 5.0.0-31-generic), btrfs-progs 4.15.1. > > About the cause, either btrfs didn't write some tree blocks correctly or > the NVMe doesn't implement FUA/FLUSH correctly (which I don't believe is > the case). > > So it's recommended to update the kernel to 5.3 kernel. > > Thanks, > Qu > > > > > TIA, > > Christian > > >
