Re: [RFC] [PATCH] Btrfs: improve fsync/osync write performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



At 20:27 09/03/31, Chris Mason wrote:
>On Tue, 2009-03-31 at 14:18 +0900, Hisashi Hifumi wrote:
>> Hi Chris.
>> 
>> I noticed performance of fsync() and write() with O_SYNC flag on Btrfs is
>> very slow as compared to ext3/4. I used blktrace to try to investigate the 
>> cause of this. One of cause is that unplug is done by kblockd even if the 
>I/O is 
>> issued through fsync() or write() with O_SYNC flag. kblockd's unplug timeout
>> is 3msec, so unplug via blockd can decrease I/O response. To increase 
>> fsync/osync write performance, speeding up unplug should be done here.
>> 
>
>> Btrfs's write I/O is issued via kernel thread, not via user application context
>> that calls fsync(). While waiting for page writeback, wait_on_page_writeback() 
>> can not unplug I/O sometimes on Btrfs because submit_bio is not called from 
>> user application context so when submit_bio is called from kernel thread, 
>> wait_on_page_writeback() sleeps on io_schedule(). 
>> 
>
>This is exactly right, and one of the uglier side effects of the async
>helper kernel threads.  I've been thinking for a while about a clean way
>to fix it.
>
>> I introduced btrfs_wait_on_page_writeback() on following patch, this is 
>replacement 
>> of wait_on_page_writeback() for Btrfs. This does unplug every 1 tick while
>> waiting for page writeback.
>> 
>> I did a performance test using the sysbench.
>> 
>> # sysbench --num-threads=4 --max-requests=10000  --test=fileio --file-num=1 
>> --file-block-size=4K --file-total-size=128M --file-test-mode=rndwr 
>> --file-fsync-freq=5  run
>> 
>> The result was:
>> -2.6.29
>> 
>> Test execution summary:
>>     total time:                          628.1047s
>>     total number of events:              10000
>>     total time taken by event execution: 413.0834
>>     per-request statistics:
>>          min:                            0.0000s
>>          avg:                            0.0413s
>>          max:                            1.9075s
>>          approx.  95 percentile:         0.3712s
>> 
>> Threads fairness:
>>     events (avg/stddev):           2500.0000/29.21
>>     execution time (avg/stddev):   103.2708/4.04
>> 
>> 
>> -2.6.29-patched
>> 
>> Test execution summary:
>>     total time:                          579.8049s
>>     total number of events:              10004
>>     total time taken by event execution: 355.3098
>>     per-request statistics:
>>          min:                            0.0000s
>>          avg:                            0.0355s
>>          max:                            1.7670s
>>          approx.  95 percentile:         0.3154s
>> 
>> Threads fairness:
>>     events (avg/stddev):           2501.0000/8.03
>>     execution time (avg/stddev):   88.8274/1.94
>> 
>> 
>> This patch has some effect for performance improvement. 
>> 
>> I think there are other reasons that should be fixed why fsync() or 
>> write() with O_SYNC flag is slow on Btrfs.
>> 
>
>Very nice.  Could I trouble you to try one more experiment?  The other
>way to fix this is to your WRITE_SYNC instead of WRITE.  Could you
>please hardcode WRITE_SYNC in the btrfs submit_bio paths and benchmark
>that?
>
>It doesn't cover as many cases as your patch, but it might have a lower
>overall impact.


Hi.
I wrote hardcode WRITE_SYNC patch for btrfs submit_bio paths as shown below,
and I did sysbench test.
Later, I will try your unplug patch.

diff -Nrup linux-2.6.29.org/fs/btrfs/disk-io.c linux-2.6.29.btrfs_sync/fs/btrfs/disk-io.c
--- linux-2.6.29.org/fs/btrfs/disk-io.c	2009-03-24 08:12:14.000000000 +0900
+++ linux-2.6.29.btrfs_sync/fs/btrfs/disk-io.c	2009-04-01 16:26:56.000000000 +0900
@@ -2068,7 +2068,7 @@ static int write_dev_supers(struct btrfs
 		}
 
 		if (i == last_barrier && do_barriers && device->barriers) {
-			ret = submit_bh(WRITE_BARRIER, bh);
+			ret = submit_bh(WRITE_BARRIER|WRITE_SYNC, bh);
 			if (ret == -EOPNOTSUPP) {
 				printk("btrfs: disabling barriers on dev %s\n",
 				       device->name);
@@ -2076,10 +2076,10 @@ static int write_dev_supers(struct btrfs
 				device->barriers = 0;
 				get_bh(bh);
 				lock_buffer(bh);
-				ret = submit_bh(WRITE, bh);
+				ret = submit_bh(WRITE_SYNC, bh);
 			}
 		} else {
-			ret = submit_bh(WRITE, bh);
+			ret = submit_bh(WRITE_SYNC, bh);
 		}
 
 		if (!ret && wait) {
diff -Nrup linux-2.6.29.org/fs/btrfs/extent_io.c linux-2.6.29.btrfs_sync/fs/btrfs/extent_io.c
--- linux-2.6.29.org/fs/btrfs/extent_io.c	2009-03-24 08:12:14.000000000 +0900
+++ linux-2.6.29.btrfs_sync/fs/btrfs/extent_io.c	2009-04-01 14:48:08.000000000 +0900
@@ -1851,8 +1851,11 @@ static int submit_one_bio(int rw, struct
 	if (tree->ops && tree->ops->submit_bio_hook)
 		tree->ops->submit_bio_hook(page->mapping->host, rw, bio,
 					   mirror_num, bio_flags);
-	else
+	else {
+		if (rw & WRITE)
+			rw = WRITE_SYNC;
 		submit_bio(rw, bio);
+	}
 	if (bio_flagged(bio, BIO_EOPNOTSUPP))
 		ret = -EOPNOTSUPP;
 	bio_put(bio);
diff -Nrup linux-2.6.29.org/fs/btrfs/volumes.c linux-2.6.29.btrfs_sync/fs/btrfs/volumes.c
--- linux-2.6.29.org/fs/btrfs/volumes.c	2009-03-24 08:12:14.000000000 +0900
+++ linux-2.6.29.btrfs_sync/fs/btrfs/volumes.c	2009-04-01 16:25:51.000000000 +0900
@@ -195,6 +195,8 @@ loop_lock:
 
 		BUG_ON(atomic_read(&cur->bi_cnt) == 0);
 		bio_get(cur);
+		if (cur->bi_rw & WRITE)
+			cur->bi_rw = WRITE_SYNC;
 		submit_bio(cur->bi_rw, cur);
 		bio_put(cur);
 		num_run++;
@@ -2815,8 +2817,11 @@ int btrfs_map_bio(struct btrfs_root *roo
 			bio->bi_bdev = dev->bdev;
 			if (async_submit)
 				schedule_bio(root, dev, rw, bio);
-			else
+			else {
+				if (rw & WRITE)
+					rw = WRITE_SYNC;
 				submit_bio(rw, bio);
+			}
 		} else {
 			bio->bi_bdev = root->fs_info->fs_devices->latest_bdev;
 			bio->bi_sector = logical >> 9;


 # sysbench --num-threads=4 --max-requests=10000  --test=fileio --file-num=1 
 --file-block-size=4K --file-total-size=128M --file-test-mode=rndwr 
 --file-fsync-freq=5  run

The result was:

-2.6.29
Test execution summary:
    total time:                          619.6822s
    total number of events:              10003
    total time taken by event execution: 403.1020
    per-request statistics:
         min:                            0.0000s
         avg:                            0.0403s
         max:                            1.4584s
         approx.  95 percentile:         0.3761s

Threads fairness:
    events (avg/stddev):           2500.7500/48.48
    execution time (avg/stddev):   100.7755/7.92


-2.6.29-WRITE_SYNC-patched

Test execution summary:
    total time:                          596.8114s
    total number of events:              10004
    total time taken by event execution: 396.2378
    per-request statistics:
         min:                            0.0000s
         avg:                            0.0396s
         max:                            1.6926s
         approx.  95 percentile:         0.3434s

Threads fairness:
    events (avg/stddev):           2501.0000/58.28
    execution time (avg/stddev):   99.0595/2.84


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux