On 2020/1/10 上午8:58, Qu Wenruo wrote: > > > On 2020/1/10 上午8:21, Qu Wenruo wrote: >> >> >> On 2020/1/9 下午10:37, David Sterba wrote: >>> On Thu, Jan 09, 2020 at 01:54:34PM +0800, Qu Wenruo wrote: >>>>>> The way I read it is more like smp_rmb/smp_wmb, but for bits in this >>>>>> case, so the smp_mb__before/after_atomic was only a syntactic sugar to >>>>>> match that it's atomic bitops. I realize this could have caused some >>>>>> confusion, however I still think that some sort of barrier is needed. >>>>> >>>>> There's an existing pattern used for serializing set/clear of >>>>> BTRFS_ROOT_IN_TRANS_SETUP (record_root_in_trans, >>>>> btrfs_record_root_in_trans). >>>>> >>>>> Once upon a time there were barriers like smp_mb__before_clear_bit but >>>>> they got unified to just smp_mb__before_atomic for all set/clear >>>>> operations, so I believe I was not all wrong with using them. >>>>> >>>> I used to believe the same fairy tail, that mb() works like a flush(), >>>> but we're not living in a fairy tail. >>> >>> This sounds strange, by flush I was refering to internal CPU mechanism >>> that makes all temporary changes visible to other cpus, so you're saying >>> that this does not work as everybody expects? >> >> Because no matter whether mb ensures that or not, the result is the same. >> >> What would happen if the reader get schedule just before the mb() >> command? Reader can still get older value. >> That makes no difference if we had mb() or not. >> >>> >>>> What mb really does is keep certain ordering from happening. >>>> And ordering means, we have at least *2* different memory accesses. >>> >>> Sorry but I think you're missing some pieces here. There are 2 memory >>> accesses: set_bit/clear_bit (writer) and test_bit (reader). >>> >>> The same could be achieved by a plain variable, that it's a bit brings >>> more confusion about what barrier should be really used. >>> >>> The pattern we want to use here is pretty standard. Read barrier before >>> read and write barrier that separates the data change from the indicator >>> of the change. If you line up the barriers, there's a clear line between >>> the data changes and the indicator value. >> >> Again, test_bit() can happen whenever they like, and smp_rmb() before >> test_bit() makes no sense. >> >> test_bit() can happen before reloc_root = NULL assign. after reloc_root >> = NULL assign but before set_bit(), or after set_bit(). >> >> That smp_wmb() makes sure set_bit() won't happen before reloc_root, than >> that's enough. >> BTW, smp_mb__before_atomic() should be full mb, not just wmb or rmb. >> >>> >>> reloc_root = NULL >>> smp_wmb smp_rmb() >>> test_bit() >>> ... here >>> set_bit(DEAD) >>> ... or here, it'll be always >>> reloc_root == NULL, but it still could >>> be before or after set_bit >>> >>> You should also distinguish between mb() and smp_mb() (and the rmb/wmb). >>> mb is a unconditional barrier, used to synchronize access to hardware io >>> ports, suitable in drivers. >>> >>> We use smp_mb() because this serializes memory among multipe CPUs, when >>> one changes memory but stores it to some temporary structures, while >>> other CPUs don't see the effects. I'm sure you've read about that in the >>> memory barrier docs. > > I guess the main difference between us is the effect of "per-cpu > viewable temporary value". > > It looks like your point is, without rmb() we can't see consistent > values the writer sees. > > But my point is, even we can only see a temporary value, the > __before_atomic() mb at the writer side, ensures only 3 possible > temporary values combination can be seen. > (PTR, DEAD), (NULL, DEAD), (NULL, 0). > > The killed (PTR, 0) combination is killed by that writer side mb. > Thus no need for the reader side mb before test_bit(). > > That's why I insist on the "test_bit() can happen whenever they like" > point, as that has the same effect as schedule. > > Thanks, > Qu Can we push the fix to upstream? I hope it to be fixed in late rc of v5.5. Thanks, Qu > >> >> Yes, but schedule can put that smp_rmb(); test_bit() line where ever >> they want. So smp_rmb(); test_bit() can still get all the 3 difference >> timing. It's that smp_mb__before_atomic() killing the 4th case, not the >> smp_rmb(). >> >>> >>>> It's not to ensure every reader get the same result, as there is no way >>>> to do that. Read can happen whenever they want. >>> >>> That is true about the 'whenever', but what we need to guarantee here is >>> that when the read happens the following condition will have view of the >>> changes implied by the pairing barrier. >> >> Still, if reader still gets the temporary value, it's the same as random >> schedule timing. >> What we need to ensure is the order, which is solely ensured by that >> smb_mb__before_atomic(). >> >> Thanks, >> Qu >> >>> >>>> So before we talk about mb, we first need to know which 2 memory >>>> accesses we're talking about. >>>> If there is even no 2 memory access, then there is no need for mb(). >>>> >>>> E.g. for the mb implied by spinlock(), it's not to ensure the spinlock >>>> counter reader, but to ensure the memory ordering between the spinlock >>>> counter itself and the memory it's protecting. >>>> >>>> So for memory barrier around test_bit(), as long as the compiler is not >>>> doing reordering, we don't need extra mb. >>> >>> This is not about compiler. >>> >>>> And if the compiler is really doing re-ordering for the >>>> have_reloc_root(), then the compiler is doing something wrong, as that >>>> would makes the test_bit() meaningless.
