Re: Questions regarding logging upon fsync in btrfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1 October 2013 21:40, Aastha Mehta <aasthakm@xxxxxxxxx> wrote:
> On 1 October 2013 19:34, Josef Bacik <jbacik@xxxxxxxxxxxx> wrote:
>> On Mon, Sep 30, 2013 at 11:07:20PM +0200, Aastha Mehta wrote:
>>> On 30 September 2013 22:47, Josef Bacik <jbacik@xxxxxxxxxxxx> wrote:
>>> > On Mon, Sep 30, 2013 at 10:30:59PM +0200, Aastha Mehta wrote:
>>> >> On 30 September 2013 22:11, Josef Bacik <jbacik@xxxxxxxxxxxx> wrote:
>>> >> > On Mon, Sep 30, 2013 at 09:32:54PM +0200, Aastha Mehta wrote:
>>> >> >> On 29 September 2013 15:12, Josef Bacik <jbacik@xxxxxxxxxxxx> wrote:
>>> >> >> > On Sun, Sep 29, 2013 at 11:22:36AM +0200, Aastha Mehta wrote:
>>> >> >> >> Thank you very much for the reply. That clarifies a lot of things.
>>> >> >> >>
>>> >> >> >> I was trying a small test case that opens a file, writes a block of
>>> >> >> >> data, calls fsync and then closes the file. If I understand correctly,
>>> >> >> >> fsync would return only after all in-memory buffers have been
>>> >> >> >> committed to disk. I have added few print statements in the
>>> >> >> >> __extent_writepage function, and I notice that the function gets
>>> >> >> >> called a bit later after fsync returns. It seems that I am not
>>> >> >> >> guaranteed to see the data going to disk by the time fsync returns.
>>> >> >> >>
>>> >> >> >> Am I doing something wrong, or am I looking at the wrong place for
>>> >> >> >> disk write? This happens both with tree logging enabled as well as
>>> >> >> >> with notreelog.
>>> >> >> >>
>>> >> >> >
>>> >> >> > So 3.1 was a long time ago and to be sure it had issues I don't think it was
>>> >> >> > _that_ broken.  You are probably better off instrumenting a recent kernel, 3.11
>>> >> >> > or just build btrfs-next from git.  But if I were to make a guess I'd say that
>>> >> >> > __extent_writepage was how both data and metadata was written out at the time (I
>>> >> >> > don't think I changed it until 3.2 or something later) so what you are likely
>>> >> >> > seeing is the normal transaction commit after the fsync.  In the case of
>>> >> >> > notreelog we are likely starting another transaction and you are seeing that
>>> >> >> > commit (at the time the transaction kthread would start a transaction even if
>>> >> >> > none had been started yet.)  Thanks,
>>> >> >> >
>>> >> >> > Josef
>>> >> >>
>>> >> >> Is there any special handling for very small file write, less than 4K? As
>>> >> >> I understand there is an optimization to inline the first extent in a file if
>>> >> >> it is smaller than 4K, does it affect the writeback on fsync as well? I did
>>> >> >> set the max_inline mount option to 0, but even then it seems there is
>>> >> >> some difference in fsync behaviour for writing first extent of less than 4K
>>> >> >> size and writing 4K or more.
>>> >> >>
>>> >> >
>>> >> > Yeah if the file is an inline extent then it will be copied into the log
>>> >> > directly and the log will be written out, no going through the data write path
>>> >> > at all.  Max inline == 0 should make it so we don't inline, so if it isn't
>>> >> > honoring that then that may be a bug.  Thanks,
>>> >> >
>>> >> > Josef
>>> >>
>>> >> I tried it on 3.12-rc2 release, and it seems there is a bug then.
>>> >> Please find attached logs to confirm.
>>> >> Also, probably on the older release.
>>> >>
>>> >
>>> > Oooh ok I understand, you have your printk's in the wrong place ;).
>>> > do_writepages doesn't necessarily mean you are writing something.  If you want
>>> > to see if stuff got written to the disk I'd put a printk at run_delalloc_range
>>> > and have it spit out the range it is writing out since thats what we think is
>>> > actually dirty.  Thanks,
>>> >
>>> > Josef
>>>
>>> No, but I also placed dump_stack() in the beginning of
>>> __extent_writepage. run_delalloc_range is being called only from
>>> __extent_writepage, if it were to be called, the dump_stack() at the
>>> top of __extent_writepage would have printed as well, no?
>>>
>>
>> Ok I've done the same thing and I'm not seeing what you are seeing.  Are you
>> using any mount options other than notreelog and max_inline=0?  Could you adjust
>> your printk to print out the root objectid for the inode as well?  It could be
>> possible that this is the writeout for the space cache or inode cache.  Thanks,
>>
>> Josef
>
> I actually printed the stack only when the root objectid is 5. I have
> attached another log for writing the first 500 bytes in a file. I also
> print the root objectid for the inode in run_delalloc and
> __extent_writepage.
>
> Thanks
>

Just to clarify, in the latest logs, I allowed printing of debug
printk's and stack dump for all root objectid's.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux