On Sun, Oct 19, 2008 at 08:16:31PM -0400, Chris Mason wrote: > > I think I'll have to come back to this after getting ENOSPC to work at > all ;) You're right that reserved space can do wonders to dig us out of :) Having been through this before, the ENOSPC accounting was incredibly hard to get right. It's at least worth thinking about the edge cases while you're writing the first version, although you will probably just have to throw one away no matter what. > holes, it has to be reserved at a multiple of the number of procs that I > allow into the transaction. > > I should be able to go into an emergency one writer at a time theme as > space gets really tight, but there are lots of missing pieces that > haven't been coded yet in that area. Makes sense. I have the following "behave like I expect" rules for things that often aren't right in the first version of a COW file system. * If a write could succeed in the future without any user-level changes to the file system, then it will succeeed the first time. Basically, this is reflecting what happens when space used by the previous version of the fs is freed after the next COW version is written out. A naive implementation of COW will fail the write if it happens while enough other writes are outstanding, even if there would be enough space after the other writes have been synced to disk and the blocks from the old version are freed. This means backing off to the one-writer-at-a-time mode you are talking about. * Rewriting metadata will always succeed. Again, with naive COW, you can get into a state where doing a chmod() on a file could end up returning ENOSPC. Totally uncool. Pretty much just requires a little reserved space. * Deletion will always succeed. Again, reserved space, plus a little forethought in metadata design. It is not automatically the case that your metadata will be designed such that deletion will always result in more free space afterwards, so it's worth a review pass just to be sure. One thing I ran into before is that it's non-trivial to calculate exactly how many blocks will need to be COW'd for even the tiniest write. Leaves split, directories grow another block, the inode block has to be copied, the tree grows another level, you have to allocate a new free space extent, etc., etc. The worst case can be hundreds of KB per 1-byte write. Logically, you may only be writing a few bytes, but they may require megabytes of free space to sync out to disk. Very annoying. -VAL -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
