At 00:17 09/04/02, Chris Mason wrote:
>On Tue, 2009-03-31 at 14:18 +0900, Hisashi Hifumi wrote:
>> Hi Chris.
>>
>> I noticed performance of fsync() and write() with O_SYNC flag on Btrfs is
>> very slow as compared to ext3/4. I used blktrace to try to investigate the
>> cause of this. One of cause is that unplug is done by kblockd even if the
>I/O is
>> issued through fsync() or write() with O_SYNC flag. kblockd's unplug timeout
>> is 3msec, so unplug via blockd can decrease I/O response. To increase
>> fsync/osync write performance, speeding up unplug should be done here.
>>
>
>I realized today that all of the async thread handling btrfs does for
>writes gives us plenty of time to queue up IO for the block device. If
>that's true, we can just unplug the block device in async helper thread
>and get pretty good coverage for the problem you're describing.
>
>Could you please try the patch below and see if it performs well? I did
>some O_DIRECT testing on a 5 drive array, and tput jumped from 386MB/s
>to 450MB/s for large writes.
>
>Thanks again for digging through this problem.
>
>-chris
>
>diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>index dd06e18..bf377ab 100644
>--- a/fs/btrfs/volumes.c
>+++ b/fs/btrfs/volumes.c
>@@ -146,7 +146,7 @@ static noinline int run_scheduled_bios(struct
>btrfs_device *device)
> unsigned long num_run = 0;
> unsigned long limit;
>
>- bdi = device->bdev->bd_inode->i_mapping->backing_dev_info;
>+ bdi = blk_get_backing_dev_info(device->bdev);
> fs_info = device->dev_root->fs_info;
> limit = btrfs_async_submit_limit(fs_info);
> limit = limit * 2 / 3;
>@@ -231,6 +231,19 @@ loop_lock:
> if (device->pending_bios)
> goto loop_lock;
> spin_unlock(&device->io_lock);
>+
>+ /*
>+ * IO has already been through a long path to get here. Checksumming,
>+ * async helper threads, perhaps compression. We've done a pretty
>+ * good job of collecting a batch of IO and should just unplug
>+ * the device right away.
>+ *
>+ * This will help anyone who is waiting on the IO, they might have
>+ * already unplugged, but managed to do so before the bio they
>+ * cared about found its way down here.
>+ */
>+ if (bdi->unplug_io_fn)
>+ bdi->unplug_io_fn(bdi, NULL);
> done:
> return 0;
> }
I tested your unplug patch.
# sysbench --num-threads=4 --max-requests=10000 --test=fileio --file-num=1
--file-block-size=4K --file-total-size=128M --file-test-mode=rndwr
--file-fsync-freq=5 run
-2.6.29
Test execution summary:
total time: 626.9416s
total number of events: 10004
total time taken by event execution: 442.5869
per-request statistics:
min: 0.0000s
avg: 0.0442s
max: 1.4229s
approx. 95 percentile: 0.3959s
Threads fairness:
events (avg/stddev): 2501.0000/73.43
execution time (avg/stddev): 110.6467/7.15
-2.6.29-patched
Operations performed: 0 Read, 10003 Write, 1996 Other = 11999 Total
Read 0b Written 39.074Mb Total transferred 39.074Mb (68.269Kb/sec)
17.07 Requests/sec executed
Test execution summary:
total time: 586.0944s
total number of events: 10003
total time taken by event execution: 347.5348
per-request statistics:
min: 0.0000s
avg: 0.0347s
max: 2.2546s
approx. 95 percentile: 0.3090s
Threads fairness:
events (avg/stddev): 2500.7500/54.98
execution time (avg/stddev): 86.8837/3.06
We can get some performance improvement by this patch.
What if the case write() without O_SYNC ?
I am concerned about decreasing optimization effect on block layer (merge, sort)
when the I/O is not fsync or write with O_SYNC.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html