Re: [PATCH] btrfs file write debugging patch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2011/2/28 Maria Wikström <maria@xxxxxxxxxxxxx>:
> mån 2011-02-28 klockan 11:10 -0500 skrev Josef Bacik:
>> On Mon, Feb 28, 2011 at 11:13:59AM +0100, Johannes Hirte wrote:
>> > On Monday 28 February 2011 02:46:05 Chris Mason wrote:
>> > > Excerpts from Mitch Harder's message of 2011-02-25 13:43:37 -0500:
>> > > > Some clarification on my previous message...
>> > > >
>> > > > After looking at my ftrace log more closely, I can see where Btrfs is
>> > > > trying to release the allocated pages.  However, the calculation for
>> > > > the number of dirty_pages is equal to 1 when "copied == 0".
>> > > >
>> > > > So I'm seeing at least two problems:
>> > > > (1)  It keeps looping when "copied == 0".
>> > > > (2)  One dirty page is not being released on every loop even though
>> > > > "copied == 0" (at least this problem keeps it from being an infinite
>> > > > loop by eventually exhausting reserveable space on the disk).
>> > >
>> > > Hi everyone,
>> > >
>> > > There are actually tow bugs here.  First the one that Mitch hit, and a
>> > > second one that still results in bad file_write results with my
>> > > debugging hunks (the first two hunks below) in place.
>> > >
>> > > My patch fixes Mitch's bug by checking for copied == 0 after
>> > > btrfs_copy_from_user and going the correct delalloc accounting.  This
>> > > one looks solved, but you'll notice the patch is bigger.
>> > >
>> > > First, I add some random failures to btrfs_copy_from_user() by failing
>> > > everyone once and a while.  This was much more reliable than trying to
>> > > use memory pressure than making copy_from_user fail.
>> > >
>> > > If copy_from_user fails and we partially update a page, we end up with a
>> > > page that may go away due to memory pressure.  But, btrfs_file_write
>> > > assumes that only the first and last page may have good data that needs
>> > > to be read off the disk.
>> > >
>> > > This patch ditches that code and puts it into prepare_pages instead.
>> > > But I'm still having some errors during long stress.sh runs.  Ideas are
>> > > more than welcome, hopefully some other timezones will kick in ideas
>> > > while I sleep.
>> >
>> > At least it doesn't fix the emerge-problem for me. The behavior is now the same
>> > as with 2.6.38-rc3. It needs a 'emerge --oneshot dev-libs/libgcrypt' with no
>> > further interaction to get the emerge-process hang with a svn-process
>> > consuming 100% CPU. I can cancel the emerge-process with ctrl-c but the
>> > spawned svn-process stays and it needs a reboot to get rid of it.
>>
>> Can you cat /proc/$pid/wchan a few times so we can get an idea of where it's
>> looping?  Thanks,
>>
>> Josef
>
> It behaves the same way here with btrfs-unstable.
> The output of "cat /proc/$pid/wchan" is 0.
>
> // Maria
>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>

I've applied the patch at the head of this thread (with the jiffies
debugging commented out) and I'm attaching a ftrace using the
function_graph tracer when I'm stuck in the loop.  I've just snipped
out a couple of the loops (the full trace file is quite large, and
mostly repititious).

I'm going to try to modify file.c with some trace_printk debugging to
show the values of several of the relevant variables at various
stages.

I'm going to try to exit the loop after 256 tries with an EFAULT so I
can stop the tracing at that point and capture a trace of the entry
into the problem (the ftrace ring buffer fills up too fast for me to
capture the entry point).

Attachment: ftrace-btrfs-file-write-debugging.gz
Description: GNU Zip compressed data


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux