On 07/19/2012 06:37 AM, Vladimir Bashkirtsev wrote:
Try to determine how much of the 200ms avg latency comes from osds vs the qemu block driver.Now I see following: 2012-07-19 22:59:55.604365 osd.1 [WRN] 6 slow requests, 6 included below; oldest blocked for > 38.034575 secs 2012-07-19 22:59:55.604377 osd.1 [WRN] slow request 38.034575 seconds old, received at 2012-07-19 22:59:17.569653: osd_op(client.154924.0:5640639 rb.0.11.00000000268d [write 2912256~4096] 2.9e04890a) v4 currently waiting for sub ops 2012-07-19 22:59:55.604384 osd.1 [WRN] slow request 38.034442 seconds old, received at 2012-07-19 22:59:17.569786: osd_op(client.154924.0:5640640 rb.0.11.00000000296f [write 2269184~4096] 2.bc0707d9) v4 currently waiting for sub ops 2012-07-19 22:59:55.604389 osd.1 [WRN] slow request 37.813370 seconds old, received at 2012-07-19 22:59:17.790858: osd_op(client.152350.0:2269710 rb.0.d.000000001e9e [write 2097152~1048576] 2.bb4282ca) v4 currently waiting for sub ops 2012-07-19 22:59:55.604396 osd.1 [WRN] slow request 37.715288 seconds old, received at 2012-07-19 22:59:17.888940: osd_op(client.152365.0:1474333 rb.0.12.0000000004d7 [write 274432~4096] 2.d68a2203) v4 currently waiting for sub ops 2012-07-19 22:59:55.604402 osd.1 [WRN] slow request 32.812673 seconds old, received at 2012-07-19 22:59:22.791555: osd_op(client.158304.0:11002 rb.0.13.000000001c6d [write 1978368~12288] 2.dada321f) v4 currently waiting for sub ops Looks like an indication of something bigger than just latency. Similar problems on osd.2 too. So with two osds misbehaving not really surprised that KVM reads are slow. Do these slow requests hold up reads?
Yes, they can hold up reads to the same object. Depending on where they're stuck, they may be blocking other requests as well if they're
e.g. taking up all the filestore threads. Waiting for subops means they're waiting for replicas to acknowledge the write and commit it to disk. The real cause for slowness of those ops is the replicas. If you enable 'debug osd = 25', 'filestore = 25', and 'debug journal = 20' you can trace through the logs to see exactly what's happening with the subops for those requests. Josh -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html