Re: Review of Fedora 18 Release Criteria

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 09, 2012 at 04:07:54PM -0700, Adam Williamson wrote:
> On Tue, 2012-10-09 at 16:48 -0400, David Cantrell wrote:
> > On Tue, Oct 09, 2012 at 07:19:25AM -0600, Tim Flink wrote:
> > > As we're getting closer to the scheduled time for beta freeze, we'd
> > > like to find out now if any of the current criteria or proposed criteria
> > > changes are unreasonable to expect for beta. There may be more changes
> > > for final as we get closer to that but I think that we're pretty close
> > > to being done with the release requirements for beta.
> > > 
> > > The current (as of this writing) release criteria are available at:
> > >  - http://fedoraproject.org/wiki/Fedora_18_Alpha_Release_Criteria
> > >  - http://fedoraproject.org/wiki/Fedora_18_Beta_Release_Criteria
> > >  - http://fedoraproject.org/wiki/Fedora_18_Final_Release_Criteria
> 
> Thanks David! Some thoughts follow.
> 
> > I would like to see changes to the blocker criteria for each release.  The
> > first item on each release criteria is that all blockers must be CLOSED.
> > Blockers are determined by criteria defined below which always group
> > anaconda in because we cannot address those problems in a later update
> > release.  This gets us on the bug fixing treadmill as we edge closer to each
> > release because every anaconda bug more or less becomes a blocker.  
> 
> This paragraph was a bit tricky to read, but now I've given it a few
> tries, it seems to be more or less a preamble, yes? I'm not sure if
> you're suggesting that "Blockers are determined by criteria defined
> below which always group anaconda in because we cannot address those
> problems in a later update release.  This gets us on the bug fixing
> treadmill as we edge closer to each release because every anaconda bug
> more or less becomes a blocker." is a problem, or just mentioning it as
> background. It's perfectly true as background, but I don't see it as a
> problem: it's just an innate characteristic of the software you write.

It was meant as background explanation for the points that followed.

> The installer is something that cannot be updated (for practical
> purposes), it must work to a high standard as shipped, because if it
> doesn't, that's a much bigger problem than a component which _can_ be
> updated not working. I agree with your assessment, but I see it as an
> inherent characteristic of an operating system installer, not any kind
> of problem in the process.

This is something I am aware of, having been working on anaconda for a long
time.

> > What I
> > would like to see:
> > 
> > 1) Installer blocker criteria needs to be more and more restrictive the
> > closer we get to a final release.
> 
> This wasn't entirely clear to me, but I'm going to take a guess at what
> I think you mean and reply to that. I think you're looking at the
> situation where we get late into the validation process - say, we just
> built RC2 and it's two days to go/no-go - and we find five bugs and mark
> them as blockers. I'm guessing you're saying it'd be preferable to
> identify blockers early and we should only add issues to the blocker
> list late if they're _really bad_, because otherwise you just keep
> fixing blockers. On that basis...

Early is better, but as you stated above it's just the nature of the
installer software.  I get that and have no problem with it, what I'm
wanting evaluated for change is the process by which we agree something is a
blocker.  Right now we run full out fixing pretty much everything we find
headed up to RC.  I would like to see something similar to what we do for
RHEL where we start to tighten blocker acceptance criteria the closer we get
to a release.

> I appreciate that the 'blocker treadmill', as you describe it, can be
> frustrating. But I don't think 'let's just not count bad issues as
> blockers late in the process to give the developers a break' is the
> answer to anything (except possibly 'how can we stop Will depleting the
> U.S. strategic gin reserve?', but that's not the question this post was
> trying to answer :>). What we're trying to do with the release
> validation process as a whole is provide a clear framework for defining
> the standards our releases should meet and a clear process for building
> releases that meet those standards and verifying that they meet those
> standards. I don't see that adding an element of time sensitivity to the
> blocker evaluation process - 'issues of the type X are blockers if we
> find them four weeks before release, but not if we find them one week
> before release' - is a good way to achieve this. 

You're not understanding what I was pointing out.  The blocker criteria
between alpha and beta should be more open than the blocker criteria between
beta and rc.  The idea is that we start accepting fewer bugs as blockers as
we get closer to RC.  Every problem encountered can be evaluated along the
lines of:

1) Who is impacted?
2) Is there a workaround?
3) Is the workaround documented?
4) Is the problem in the standard install path?

And so on.  I'm not saying we should compromise on release quality or
anything like that, but just start to ask more and more questions when
proposed blockers show up late.  Is it really a blocker or not.

> 'Blocker bugs' are just the 'release quality' question inverted: they
> are the ways in which our releases must not be broken in order to meet
> the minimum quality standards we've decided on. An issue which causes us
> to fall below our minimum quality standards is a problem no matter when
> it's discovered. I absolutely understand that it makes things easier for
> the developers if we catch blocker bugs early, and we agree this is an
> important goal and we have made and will continue to make efforts to
> improve our ability to catch blockers as early as possible. I know it
> sucks when we're on RC3 and we suddenly discover a major bug. But it's
> still a major bug, and 'say it's not a blocker because we're late in the
> process' doesn't sound like a good response to that suckage, to me. I
> don't want to do that.

Again, that's not what I'm saying we should do.  But if there's a traceback
that appears when someone is doing a LUKS installation to USB attached
storage that also has a pre-existing NTFS volume for "music"....that doesn't
sound like a really suport important use case late in the cycle.  Maybe we
should consider documenting that as a limitation with or without a
workaround for that release and slating it for the next release.

We do want to make a good release and keep the users happy, but we need to
also remember that developers are people too and not just meat grinders for
bug reports.

> I believe we should set realistic minimum standards - those that are
> achievable with the level of development resources we have in place, on
> the release schedules Fedora is committed to. What this thread is about,
> essentially, is checking that we are not currently setting that bar too
> high, and demanding from you more than you have the resources to
> possibly provide in the time available. We certainly believe that we
> need input from the development teams to know where the bar should be
> set. But I do believe the bar should be a bar, not a fuzzy field that
> can be adjusted with excessive pragmatism. We should set realistic
> standards, but they have to be solid ones that we don't compromise just
> because time is short or the developers are getting tired of fixing
> bugs.

>From the installer team's perspective, the useful things for release
criteria to consider would be:

1) Adjustments to the blocker acceptance criteria that I explained.
2) Acknowledgement that concurrent releases are always in play for the
   installer team.  Generally two RHEL releases and one Fedora release.
   RHEL gets priority.

> What we (QA) as a team do try and do in those cases is look at the
> situation and think what we could do in future to ensure the blocker
> would get caught earlier. For instance, in the last few releases we've
> been making a more concerted effort to complete testing even on TC/RC
> builds that have obvious showstoppers - to catch the other bugs 'behind'
> the showstoppers, rather than just catching the showstoppers and then
> focusing work on getting them fixed, then continuing on with testing of
> other functionality.

You have improved bug reporting earlier over the past few Fedora releases.

> I don't mean to start a finger-pointing match, but I do think it's worth
> bearing in mind that the 'blocker treadmill' is much more likely to
> happen when there are major changes to anaconda, because these massively
> increase the surface area of code that's prone to causing blocker bugs.
> When we do a release, we can say with a reasonable degree of certainty
> that the code in that release probably contains very few blocker bugs -
> only ones we didn't catch in the validation process it just went
> through. 

This is nothing new to us, we've been through it many times before.  But it
doesn't mean we enjoy it.  Unless we want to put anaconda in a permanent
maintenance mode, it means we will always be playing this game.

> If we then do another release in which that code isn't changed very
> much, well, we aren't likely to have two hundred new blocker bugs. 
> 
> But if we (Fedora) do, oh, let's just say as _entirely theoretical
> examples_, rewrite the entire storage backend, or replace the entire
> first stage of the whole installer, or rewrite the entire user
> interface...we've just thrown out all the code that's relatively well
> known to be 'blocker free', and replaced it with an entirely new chunk
> of code about which we know just about nothing from a quality
> perspective. Statistically speaking, no matter how awesome the person or
> people writing it, that new chunk of code is very likely to contain more
> blockers than the code it replaced. Major changes to the code inevitably
> result in more blockers being present, and thus more blocker treadmill,
> than light-touch maintenance of a mature codebase does. We (QA) are
> always going to be able to find ten blockers in a well-known codebase
> much faster than we can find two hundred blockers in a heavily revised
> codebase.
> 
> Certainly QA has some responsibility for the 'blocker treadmill', as I
> noted above, it's our responsibility to try and identify blocker bugs as
> early and as quickly as possible, and this is something we can and
> should always look to improve. But developers also have responsibility
> for it. If you're stuck on a 'blocker treadmill' it could be an
> indication that QA could and should have discovered the blocker bugs
> faster, but it could also be an indication that you have been too
> ambitious in your planning in terms of what amount of new or revised
> code of acceptable quality you expected to be able to implement in what
> time frame, and consequently you have delivered code that is heavily
> bugged, at a late enough point in the development cycle that you
> immediately wind up on a 'blocker treadmill' just fixing all the bugs in
> the code you just delivered. I don't think it's controversial to say
> this has been known to be a problem in the world of software development
> before :)

Let me be clear here:  I am not complaining about the QA team performing
tests and reporting bugs.  That's what you're supposed to do.  The blocker
treadmill has been in place for many many many releases and we have always
been involved with iti (but we know this, you don't have to explain that
it's because of the nature of our software...in fact, that's been our
explanation for years now so the fact that you're telling us that seems to
indicate that people are finally starting to "get it").  Anaconda depends
on many components and many times last minute bugs are due to changes in
dependent components that caught us by surprise.

What I'm asking for is blocker acceptance criteria changes that are more
realistic given remaining time and other commitments.  Or dump the
time/schedule requirement and just say we'll call it done when it's done.  I
really don't care here, but we need to find a way to start denying blockers
as we get closer to a release.

> > 2) Installer blockers should only be granted when there is no other way to
> > accomplish the same task during installation.  For example, if FCoE
> > configuration does not always work in the UI but does work when passed boot
> > parameters or via kickstart, we shouldn't consider it a blocker.  It's an
> > unfortunate bug, but as described there is an install path for those users.
> 
> In practice we do and always have considered workarounds in evaluating
> blocker status for bugs. This isn't brilliantly called out in the
> criteria pages, I admit, and we should improve that. The section
> '(release) Blocker Bugs', right below the criteria, could really do with
> some adjustment.
> 
> It's hard to be more precise than this because workaround evaluation is
> one part of the blocker review process that more or less inevitably
> continues to involve subjectivity, and it's very much a bug-by-bug
> thing. But obviously, the more severe and more commonly-encountered the
> issue, the less likely we usually are to accept 'there's a workaround'
> as a reason not to take it as a blocker. The ease of the workaround and
> the likelihood of a user thinking of it themselves - or at least
> figuring that there _might_ be a workaround, and they should go and look
> for one - are also taken into consideration.
> 
> So...we do consider workarounds. And yes, this should be explained more
> clearly in the process documentation, we'll address that. I don't think
> we should accept your principle - "Installer blockers should only be
> granted when there is no other way to accomplish the same task during
> installation." - as solidly as it's stated, though, as it removes too
> much flexibility in the evaluation process. 
> 
> To give a competing example, in anaconda 18.13 there is a bug in the new
> partitioning process - I call it 'guided partitioning', the dialog which
> attempts to help you free up space on a full disk, by deleting or
> shrinking partitions - which causes it to crash when trying to delete
> partitions. But if you go into the 'custom partitioning' interface you
> can successfully delete partitions. So by your principle, we would not
> take that bug as a release blocker. I don't think that would be a good
> decision: we should not release an installer which crashes when you try
> to follow the path you're guided to, for freeing up space to install the
> operating system. 'Don't do what the installer recommends you to do,
> instead go into this advanced process that's supposed to be for experts'
> is a workaround, and hence satisfies your requirement for not-a-blocker,
> but I really don't think it's a good story to tell people in the case of
> such a critical bit of functionality.

Crashes through the standard install path would of course be a blocker.  Why
would you think I would want otherwise?  You're reading my proposal way too
literally.

> It also has a clear negative effect on the very problem we're discussing
> here: the broken code cannot get any testing. All we can know about a
> codepath that's broken, but for which we accepted a workaround that
> dodges the broken codepath, is 'it's completely broken'. If the broken
> codepath is not treated as a blocker and fixed rapidly, we cannot test
> it 'beyond' the blocker bug. There might be five further blockers behind
> that bug, or just _regular_ bugs ('the UI sucks', 'it doesn't offer to
> let me resize a partition it should have done', 'it prints a bogus error
> message when I delete a partition'...all those kinds of perfectly normal
> bugs), but if we take a workaround and called it 'not a blocker' it gets
> dropped in priority, likely doesn't get fixed for weeks, and when it
> turns out there's five other bugs 'behind' the showstopper...well, they
> go on your treadmill. =) Accepting workarounds too readily actually
> _impedes_ our ability to find other bugs swiftly.

I didn't read this paragraph.  This is a lot of text, but I've skimmed it.
What I think we should consider is whether Fedora should have a separate
entity or group that accepts blockers.  Right now, that responsibility is
largely falling on your team.

> > 3) Ultimately we want the number of granted blockers to be lower and lower
> > from alpha to beta to rc.
> 
> I understand the motivation behind this, and I think it's a goal we can
> attempt to address by ensuring comprehensive testing is done early (and
> a goal you can help to address by ensuring major code changes land early
> enough to be tested, and budgeting time and resources for fixing the
> bugs that will _inevitably be present_ in any large chunk of new code).
> But I don't think 'make it harder for a bug to qualify as a blocker the
> later we get in the release process' is a good thing to do, even though
> it would help to achieve this goal. To me it looks like a process hack
> which would ultimately damage the quality of our releases. It's actually
> something that we've specifically tried *not* to do in the blocker
> review process, since it was implemented. We have very intentionally
> attempted to review bugs 'impartially', treating blocker status as
> something a bug either should have or should not have on its own merits,
> and attempting not to take into account things that strictly should not
> be taken into account, like 'is there a fix already?' or 'how close are
> we to release?'

I'd like to apply my previous point here too.

-- 
David Cantrell <dcantrell@xxxxxxxxxx>
Manager, Installer Engineering Team
Red Hat, Inc. | Westford, MA | EST5EDT

_______________________________________________
Anaconda-devel-list mailing list
Anaconda-devel-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/anaconda-devel-list


[Index of Archives]     [Kickstart]     [Fedora Users]     [Fedora Legacy List]     [Fedora Maintainers]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [Yosemite Photos]     [KDE Users]     [Fedora Tools]
  Powered by Linux