Hi,
I have a btrfs filesystem on a 4TB HDD connected with USB 2.0. Some time
ago I accidentally disconnected the drive while doing heavy writes.
After reconnecting it seemed like the filesystem still works (it mounted
fine and I could read some files chosen at random), but I ran `btrfs
scrub` to be sure.
`btrfs scrub` aborted itself after ~20 hours, after reading ~3.5TB of
data. `dmesg` contained a single line:
#v+
BTRFS error (device dm-0): bad tree block start 0 3527021166592
#v-
I couldn't find any further details anywhere in logs. I assume this
means that some data have actually been lost from this filesystem. I
have backups of data from this drive, so I decided to play a little
trying out btrfs recovery strategies.
I checked whether there are any bad blocks on the raw device — all
blocks were read successfully.
I created a devicemapper snapshot/overlay to keep the raw device data
read only and track the changes made by any recovery procedures.
I ran `btrfstune -u` on the overlay to avoid having two devices with the
same uuid. This was done using a dedicated VM which did not see the raw
device (suggested by `Ke` on IRC). BTW, this command resulted in the
overlay device growing by ~25GB, which IIUC means that around 6M
4096-byte blocks were changed in the process (is that expected?).
I was recommended to run `btrfs check`. The result is here: [1] (323
lines of output), and IIRC it finished in few hours.
[1] https://gist.github.com/liori/f8c5e69677e8c9d6038d2e3e4db9aa42
(5 data checksum errors are a preexisting condition, I knew about them
before the incident).
I then started `btrfs check --repair`. This was about a week ago, and it
is still going. The partial output is here: [2] (already almost 18k
lines). The same problems are being found again and again in a loop, as
if it was stuck.
[2] https://gist.github.com/liori/01494afbe63cd19ba49be663be937d84
I do observe that the ctime of the overlay file is updated every once a
while, but the file itself does not grow anymore after some initial
change of ~70k blocks. My interpretation is that even if the repair
process writes anything, it only keeps writing in the same places again
and again.
I did not have any snapshots on this filesystem. I did have some
deduplicated content, but no more than 4 copies of any data block, and
deduplication resulted in saving ~1TB of space total. The device was
never a part of a multi-device setup.
Is there anything more I can do with this filesystem to bring it to a
state where I can `btrfs scrub` it, know what have been lost, etc? Is
this behavior of `btrfs scrub --repair` expected and will it ever finish?
Thank you,
--
Tomasz Melcer
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html