On Wed, Apr 24, 2013 at 07:25:09AM -0600, Jan Schmidt wrote:
> On Wed, April 24, 2013 at 15:04 (+0200), Josef Bacik wrote:
> > On Tue, Apr 23, 2013 at 12:00:27PM -0600, Jan Schmidt wrote:
> >> Sequence numbers for delayed refs have been introduced in the first version
> >> of the qgroup patch set. To solve the problem of find_all_roots on a busy
> >> file system, the tree mod log was introduced. The sequence numbers for that
> >> were simply shared between those two users.
> >>
> >> However, at one point in qgroup's quota accounting, there's a statement
> >> accessing the previous sequence number, that's still just doing (seq - 1)
> >> just as it had to in the very first version.
> >>
> >> To satisfy that requirement, this patch makes the sequence number counter 64
> >> bit and splits it into a major part (used for qgroup sequence number
> >> counting) and a minor part (incremented for each tree modification in the
> >> log). This enables us to go exactly one major step backwards, as required
> >> for qgroups, while still incrementing the sequence counter for tree mod log
> >> insertions to keep track of their order. Keeping them in a single variable
> >> means there's no need to change all the code dealing with comparisons of two
> >> sequence numbers.
> >>
> >> The sequence number is reset to 0 on commit (not new in this patch), which
> >> ensures we won't overflow the two 32 bit counters.
> >>
> >> Without this fix, the qgroup tracking can occasionally go wrong and WARN_ONs
> >> from the tree mod log code may happen.
> >>
> >> Signed-off-by: Jan Schmidt <list.btrfs@xxxxxxxxxxxxx>
> >> ---
> >> fs/btrfs/ctree.c | 36 +++++++++++++++++++++++++++++++++---
> >> fs/btrfs/ctree.h | 7 ++-----
> >> fs/btrfs/delayed-ref.c | 6 ++++--
> >> fs/btrfs/disk-io.c | 2 +-
> >> fs/btrfs/extent-tree.c | 5 +++--
> >> fs/btrfs/qgroup.c | 13 ++++++++-----
> >> fs/btrfs/transaction.c | 2 +-
> >> 7 files changed, 52 insertions(+), 19 deletions(-)
> >>
> >> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> >> index 566d99b..b74136e 100644
> >> --- a/fs/btrfs/ctree.c
> >> +++ b/fs/btrfs/ctree.c
> >> @@ -361,6 +361,36 @@ static inline void tree_mod_log_write_unlock(struct btrfs_fs_info *fs_info)
> >> }
> >>
> >> /*
> >> + * increment the upper half of tree_mod_seq, set lower half zero
> >> + *
> >> + * must be called with fs_info->tree_mod_seq_lock held
> >> + */
> >> +static inline u64 btrfs_inc_tree_mod_seq_major(struct btrfs_fs_info *fs_info)
> >> +{
> >> + u64 seq = atomic64_read(&fs_info->tree_mod_seq);
> >> + seq &= 0xffffffff00000000ull;
> >> + seq += 1ull << 32;
> >> + atomic64_set(&fs_info->tree_mod_seq, seq);
> >> + return seq;
> >> +}
> >
> > This isn't going to work, you read in the value, inc it and then set the new
> > value. If somebody comes in and inc's in between the read and the sync, like
> > btrfs_inc_tree_mod_seq_minor could do when you call tree_mod_alloc, you'll end
> > up losing the minor update. Thanks,
>
> I don't think I'll lose it. The minor update is made and returned to the one who
> needs it, that number can still be used. There is no guarantee for two
> concurrent modifications to which major a minor number belongs, though.
>
Ah yeah it's returned, but then you'll end up with two things that have the same
seq number when they shouldn't have. I'm not sure if that's a bad thing or not,
but from up here it looks like a bad thing, at the very least it's going to
surprise anybody who goes to work on this code later and expects it to work out
like a normal atomic would. So either wrap it in a spin_lock (it looks like you
already have a mod_seq_lock) or come up with something less crafty. Thanks,
Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html