Re: [PATCH v2] btrfs: relocation: Fix KASAN reports caused by extended reloc tree lifespan

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2020/1/10 上午8:58, Qu Wenruo wrote:
>
>
> On 2020/1/10 上午8:21, Qu Wenruo wrote:
>>
>>
>> On 2020/1/9 下午10:37, David Sterba wrote:
>>> On Thu, Jan 09, 2020 at 01:54:34PM +0800, Qu Wenruo wrote:
>>>>>> The way I read it is more like smp_rmb/smp_wmb, but for bits in this
>>>>>> case, so the smp_mb__before/after_atomic was only a syntactic sugar to
>>>>>> match that it's atomic bitops. I realize this could have caused some
>>>>>> confusion, however I still think that some sort of barrier is needed.
>>>>>
>>>>> There's an existing pattern used for serializing set/clear of
>>>>> BTRFS_ROOT_IN_TRANS_SETUP (record_root_in_trans,
>>>>> btrfs_record_root_in_trans).
>>>>>
>>>>> Once upon a time there were barriers like smp_mb__before_clear_bit but
>>>>> they got unified to just smp_mb__before_atomic for all set/clear
>>>>> operations, so I believe I was not all wrong with using them.
>>>>>
>>>> I used to believe the same fairy tail, that mb() works like a flush(),
>>>> but we're not living in a fairy tail.
>>>
>>> This sounds strange, by flush I was refering to internal CPU mechanism
>>> that makes all temporary changes visible to other cpus, so you're saying
>>> that this does not work as everybody expects?
>>
>> Because no matter whether mb ensures that or not, the result is the same.
>>
>> What would happen if the reader get schedule just before the mb()
>> command? Reader can still get older value.
>> That makes no difference if we had mb() or not.
>>
>>>
>>>> What mb really does is keep certain ordering from happening.
>>>> And ordering means, we have at least *2* different memory accesses.
>>>
>>> Sorry but I think you're missing some pieces here. There are 2 memory
>>> accesses: set_bit/clear_bit (writer) and test_bit (reader).
>>>
>>> The same could be achieved by a plain variable, that it's a bit brings
>>> more confusion about what barrier should be really used.
>>>
>>> The pattern we want to use here is pretty standard. Read barrier before
>>> read and write barrier that separates the data change from the indicator
>>> of the change. If you line up the barriers, there's a clear line between
>>> the data changes and the indicator value.
>>
>> Again, test_bit() can happen whenever they like, and smp_rmb() before
>> test_bit() makes no sense.
>>
>> test_bit() can happen before reloc_root = NULL assign. after reloc_root
>> = NULL assign but before set_bit(), or after set_bit().
>>
>> That smp_wmb() makes sure set_bit() won't happen before reloc_root, than
>> that's enough.
>> BTW, smp_mb__before_atomic() should be full mb, not just wmb or rmb.
>>
>>>
>>> reloc_root = NULL
>>> smp_wmb                           smp_rmb()
>>>                                   test_bit()
>>>                                   ... here
>>> set_bit(DEAD)
>>>                                   ... or here, it'll be always
>>> 				  reloc_root == NULL, but it still could
>>> 				  be before or after set_bit
>>>
>>> You should also distinguish between mb() and smp_mb() (and the rmb/wmb).
>>> mb is a unconditional barrier, used to synchronize access to hardware io
>>> ports, suitable in drivers.
>>>
>>> We use smp_mb() because this serializes memory among multipe CPUs, when
>>> one changes memory but stores it to some temporary structures, while
>>> other CPUs don't see the effects. I'm sure you've read about that in the
>>> memory barrier docs.
>
> I guess the main difference between us is the effect of "per-cpu
> viewable temporary value".
>
> It looks like your point is, without rmb() we can't see consistent
> values the writer sees.
>
> But my point is, even we can only see a temporary value, the
> __before_atomic() mb at the writer side, ensures only 3 possible
> temporary values combination can be seen.
> (PTR, DEAD), (NULL, DEAD), (NULL, 0).
>
> The killed (PTR, 0) combination is killed by that writer side mb.
> Thus no need for the reader side mb before test_bit().
>
> That's why I insist on the "test_bit() can happen whenever they like"
> point, as that has the same effect as schedule.
>
> Thanks,
> Qu

Can we push the fix to upstream? I hope it to be fixed in late rc of v5.5.

Thanks,
Qu

>
>>
>> Yes, but schedule can put that smp_rmb(); test_bit() line where ever
>> they want. So smp_rmb(); test_bit() can still get all the 3 difference
>> timing. It's that smp_mb__before_atomic() killing the 4th case, not the
>> smp_rmb().
>>
>>>
>>>> It's not to ensure every reader get the same result, as there is no way
>>>> to do that. Read can happen whenever they want.
>>>
>>> That is true about the 'whenever', but what we need to guarantee here is
>>> that when the read happens the following condition will have view of the
>>> changes implied by the pairing barrier.
>>
>> Still, if reader still gets the temporary value, it's the same as random
>> schedule timing.
>> What we need to ensure is the order, which is solely ensured by that
>> smb_mb__before_atomic().
>>
>> Thanks,
>> Qu
>>
>>>
>>>> So before we talk about mb, we first need to know which 2 memory
>>>> accesses we're talking about.
>>>> If there is even no 2 memory access, then there is no need for mb().
>>>>
>>>> E.g. for the mb implied by spinlock(), it's not to ensure the spinlock
>>>> counter reader, but to ensure the memory ordering between the spinlock
>>>> counter itself and the memory it's protecting.
>>>>
>>>> So for memory barrier around test_bit(), as long as the compiler is not
>>>> doing reordering, we don't need extra mb.
>>>
>>> This is not about compiler.
>>>
>>>> And if the compiler is really doing re-ordering for the
>>>> have_reloc_root(), then the compiler is doing something wrong, as that
>>>> would makes the test_bit() meaningless.




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux