Re: Theoretical Question about commit=n

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/13/2017 01:41 AM, Qu Wenruo wrote:
> 
> On 2017年11月13日 06:01, Hans van Kranenburg wrote:
>> On 11/12/2017 09:58 PM, Robert White wrote:
>>> Is the commit interval monotonic, or is it seconds after sync?
>>>
>>> What I mean is that if I manually call sync(2) does the commit timer
>>> reset? I'm thinking it does not, but I can imagine a workload where it
>>> ideally would.
>>
>> The magic happens inside the transaction kernel thread:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/disk-io.c?h=v4.14#n1925
>>
>> You can see the delay being computed:
>>     delay = HZ * fs_info->commit_interval;
>>
>> Almost at the end of the function, you see:
>>     schedule_timeout(delay)
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/time/timer.c?h=v4.14#n1676
>>
>> This schedule_timeout function sets a timer and then the thread goes to
>> sleep. If nothing happens, the kernel will wake up the thread after the
>> timer expires (can be later, but not earlier) and then it will redo the
>> loop.
>>
>> If something else wakes up the transaction thread, the timer is
>> discarded if it's not expired yet.
> 
> So far so good.
> 
>>
>> So it works like you would want.
> 
> Not exactly.

Ah, interesting.

> Sync or commit_transaction won't wake up transaction_kthread.
> 
> transaction_kthread will mostly be woken by trans error, remount or
> under certain case of btrfs_end_transaction.
> 
> So manually sync will not (at least not always) interrupt commit interval.

The fun thing is, when I just do sync, I see that the time it takes for
a next generation bump to happen is reset (while doing something simple
like touch x in a loop in another terminal).

> And even more, transaction_kthread will only commit transaction, which
> means it will only ensure metadata consistent.
> 
> It won't ensure buffered write to reach disk if its extent is not
> allocated yet (delalloc).

Hm, I have seen things like that in BTRFS_IOC_SYNC...

Actually, I first responded on the timer reset question, because that
one was easy to answer. I don't know if I want to descend the path
further into (f)sync. I heard it can get really messy down there. :]

> 
> Thanks,
> Qu
>>
>> You can test this yourself by looking at the "generation" number of your
>> filesystem. It's in the output of btrfs inspect dump-super:
>>
>> This is the little test filesystem I just used:
>>
>> -# btrfs inspect dump-super /dev/dorothy/mekker | grep ^generation
>> generation		35
>>
>> If you print the number in a loop, like every second, you can see it
>> going up after a transaction happened. Now play around with other things
>> and see when it changes.
>>
>>> (Again, this is purely theoretical, I have no such workload as I am
>>> about to describe.)
>>>
>>> So suppose I have some sort of system, like a database, that I know will
>>> do scattered writes and extends through some files and then call some
>>> variant of sync(2). And I know that those sync() calls will be every
>>> forty-to-sixty seconds because of reasons. It would be "neat" to be able
>>> to set the commit=n to some high value, like 90, and then "normally" the
>>> sync() behaviours would follow the application instead of the larger
>>> commit interval.
>>>
>>> The value would be that the file system would tend _not_ to go into sync
>>> while the application was still skittering about in the various files.
>>>
>>> Of course any other applications could call sync from their own contexts
>>> for their own reasons. And there's an implicit fsync() on just about any
>>> close() (at least if everything is doing its business "correctly")
>>>
>>> It may be a strange idea but I can think of some near realtime
>>> applications might be able to leverage a modicum of control over the
>>> sync event. There is no API, and not strong reason to desire one, for
>>> controlling the commit via (low privelege) applications.
>>>
>>> But if the plumbing exists, then having a mode where sync() or fsync()
>>> (which I think causes a general sync because of the journal) resets the
>>> commit timer could be really interesting.
>>>
>>> With any kind of delayed block choice/mapping it could actually reduce
>>> the entropy of the individual files for repeated random small writes.
>>> The application would have to be reasonably aware, of course.
>>>
>>> Since something is causing a sync() the commit=N guarantee is still
>>> being met for the whole system for any N, but applications could tend to
>>> avoid mid-write commits by planing their sync()s.
>>>
>>> Just a thought.
>>
>>


-- 
Hans van Kranenburg
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux