We've been running BtrFS for a couple months now in production on several clusters. We're running on Canonical's 4.8 kernel, and currently, in the process of moving to our own patchset atop vanilla 4.10+. I'm glad to say it's been a fairly good experience for us. Bar some performance issues, it's been largely smooth sailing. There has been one class of persistent issues that has been plaguing our cluster is deadlocks. We've seen a fair number of issues where there are some number of background threads and user threads are in the process of performing operations where some are waiting to start a transaction, and at least one background thread or user thread is in the process of committing a transaction. Unfortunately, these situations are ending in deadlocks, where no threads are making progress. We've talked about a couple ideas internally, like adding the ability to timeout transactions, abort commits or start_transactions which are taking too long, and adding more debugging to get insights into the state of the filesystem. Unfortunately, since our usage and knowledge of BtrFS is still somewhat nascent, we're unsure of what is the right investment. I'm curious, are other people seeing deadlocks crop up in production often? How are you going about debugging them, and are there any good pieces of advice on avoiding these for production workloads? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
