Re: data rolled back 5 hours after crash, long fsync running times, watchdog evasion on 5.4.11

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Feb 09, 2020 at 10:00:34AM +0100, Martin Steigerwald wrote:
> Zygo Blaxell - 09.02.20, 01:43:07 CET:
> > Up to that point, a few processes have been blocked for up to 5 hours,
> > but this is not unusual on a big filesystem given #1.  Usually
> > processes that read the filesystem (e.g. calling lstat) are not
> > blocked, unless they try to access a directory being modified by a
> > process that is blocked. lstat() being blocked is unusual.
> 
> This is really funny, cause what you consider not being unusual, I'd 
> consider a bug or at least a huge limitation.
> 
> But in a sense I never really got that processed can be stuck in 
> uninterruptible sleep on Linux or Unix *at all*. Such a situation 
> without giving a user at least the ability to end it by saying "I don't 
> care about the data that process is to write, let me remove it already" 
> for me is a major limitation to what appears to be kind of specific to 
> the UNIX architecture or at least the way the Linux virtual memory 
> manager is working.

> That written I may be completely ignorant of something very important 
> here and some may tell me it can't be any other way for this and that 
> reason. Currently I still think it can.

The process in uninterruptible sleep is waiting for the filesystem code to
finish whatever it's doing so the in-memory and on-disk structures end in
a consistent state.  If whatever it's doing is "waiting for a lock held by
some other thread doing an expensive thing", it can block for a long time.

We can't simply abort the kernel thread here, which is why it's
uninterruptible wait (*).  Generic interruption would need to unwind the
kernel stack all the way back to userspace, reverting all changes made
to the filesystem's internal data structures as we go, without tripping
over the need for some other lock in the process, and without introducing
horrible new regressions.

In theory we can interrupt any kernel thread at any time--that happens
naturally whenever there's a BUG() or power failure, for instance--but
the effect on all the other threads that might be running is pretty
painful.

If you add a level of indirection--e.g. run the btrfs code in a VM and
access it via a network or virtio client--then we can interrupt the
client, but the server ends up having to finish whatever operation the
client requested anyway, so the client just gets to immediately hang
waiting for the server on its next call.

> And even if uninterruptible sleep can still happen cause it is really 
> necessary, five hours is at least about five hours minus probably a minute 
> or so too long.

Yes it would be nice if btrfs could avoid overcommitting itself so badly,
but that's a somewhat older and larger-scoped bug.

> Ciao,
> -- 
> Martin
> 
> 

(*) well we could, if all the filesystem code was written that way.
Patches welcome!

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux