Debugging Deadlocks?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We've been running BtrFS for a couple months now in production on
several clusters. We're running on Canonical's 4.8 kernel, and
currently, in the process of moving to our own patchset atop vanilla
4.10+. I'm glad to say it's been a fairly good experience for us. Bar
some performance issues, it's been largely smooth sailing.

There has been one class of persistent issues that has been plaguing
our cluster is deadlocks. We've seen a fair number of issues where
there are some number of background threads and user threads are in
the process of performing operations where some are waiting to start a
transaction, and at least one background thread or user thread is in
the process of committing a transaction. Unfortunately, these
situations are ending in deadlocks, where no threads are making
progress.

We've talked about a couple ideas internally, like adding the ability
to timeout transactions, abort commits or start_transactions which are
taking too long, and adding more debugging to get insights into the
state of the filesystem. Unfortunately, since our usage and knowledge
of BtrFS is still somewhat nascent, we're unsure of what is the right
investment.

I'm curious, are other people seeing deadlocks crop up in production
often? How are you going about debugging them, and are there any good
pieces of advice on avoiding these for production workloads?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux