Re: Bad hard drive - checksum verify failure forces readonly mount

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 24, 2016 at 6:06 PM, Vasco Almeida <vascomalmeida@xxxxxxx> wrote:
> Citando Chris Murphy <lists@xxxxxxxxxxxxxxxxx>:

>> A lot of changes have happened since 4.1.2 I would still use something
>> newer and try to repair it.
>
>
> By repair do you mean issue "btrfs check --repair /device" ?

Once you have copied off the important stuff, yes. It's less likely to
make things worse now. However, there are some things to do first:




> dmesg http://paste.fedoraproject.org/384352/80842814/

[ 1837.386732] BTRFS info (device dm-9): continuing balance
[ 1838.006038] BTRFS info (device dm-9): relocating block group
15799943168 flags 34
[ 1838.684892] BTRFS info (device dm-9): relocating block group
10934550528 flags 36
[ 1839.301453] ------------[ cut here ]------------
[ 1839.301495] WARNING: CPU: 3 PID: 76 at fs/btrfs/extent-tree.c:1625
lookup_inline_extent_backref+0x45c/0x5a0 [btrfs]()

followed by

[ 1839.301797] WARNING: CPU: 3 PID: 76 at fs/btrfs/extent-tree.c:2946
btrfs_run_delayed_refs+0x29d/0x2d0 [btrfs]()
[ 1839.301798] BTRFS: Transaction aborted (error -5)
[...]
[ 1839.301972] BTRFS: error (device dm-9) in
btrfs_run_delayed_refs:2946: errno=-5 IO failure
[ 1839.301975] BTRFS info (device dm-9): forced readonly

So it looks like it was resuming a balance automatically, and while
processing delayed references it's running into something it doesn't
expect and doesn't have a way to fix, so it goes read only to avoid
causing more problems.

I would do a couple things in order:
1. Mount ro and copy off what you want in case the whole thing gets
worse and can't ever be mounted again.
2. Mount with only these options: -o skip_balance,subvolid=5,nospace_cache

If it mounts rw, don't do anything with it, just see if it cleans up
after itself. It also looks from the previous trace it was trying to
remove a snapshot and there are complaints of problems in that
snapshot. So hopefully just waiting 5 minutes doing nothing and it'll
clean up after itself (you can check with top to see if there are any
btrfs related transactions that run including the btrfs-cleaner
process) wait until they're done.

Then umount. If you want you could have two other consoles ready
first, one for 'journalctl -f' and another for sysrq+t to issue in
case you get a hang. This doesn't fix anything but it collects more
information for a bug report for the devs.

Once you get it umounted normally or by force, the next thing to do is

3. btrfs-image so that devs can see what's causing the problem that
the current code isn't handling well enough.
4. btrfs check --repair

Let's see the results of that repair. You can use 'script
btrfsrepair.txt' first and then 'btrfs check --repair' and it will log
everything. After btrfs check completes, use 'exit' to stop script
from recording and you should have a btrfsrepair.txt file you can post
somewhere. When using > not everything gets logged for some reason but
script will capture everything.

Depending on how the repair goes, there might be a couple more options left.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux