On 06/08/2012 06:55 AM, Sage Weil wrote:
On Fri, 8 Jun 2012, Oliver Francke wrote:Hi Guido, yeah, there is something weird going on. I just started to establish some test-VM's. Freshly imported from running *.qcow2 images. Kernel panic with INIT, seg-faults and other "funny" stuff. Just added the rbd_cache=true in my config, voila. All is fast-n-up-n-running... All my testing was done with cache enabled... Since our errors all came from rbd_writeback from former ceph-versions...Are you guys able to reproduce the corruption with 'debug osd = 20' and 'debug ms = 1'? Ideally we'd like to: - reproduce from a fresh vm, with osd logs - identify the bad file - map that file to a block offset (see http://ceph.com/qa/fiemap.[ch], linux_fiemap.h) - use that to identify the badness in the log I suspect the cache is just masking the problem because it submits fewer IOs...
The cache also doesn't do sparse reads. Is it still reproducible with a fresh vm when you set filestore_fiemap_threshold = 0 for the osds, and run without rbd caching? Josh
sageJosh? Sage? Help?! Oliver. On 06/08/2012 02:55 PM, Guido Winkelmann wrote:Am Donnerstag, 7. Juni 2012, 12:48:05 schrieben Sie:On 06/07/2012 11:04 AM, Guido Winkelmann wrote:Hi, I'm using Ceph with RBD to provide network-transparent disk images for KVM- based virtual servers. The last two days, I've been hunting some weird elusive bug where data in the virtual machines would be corrupted in weird ways. It usually manifests in files having some random data - usually zeroes - at the start before the actual contents that should be in there start.I definitely want to figure out what's going on with this. A few questions: Are you using rbd caching? If so, what settings? In either case, does the corruption still occur if you switch caching on/off? There are different I/O paths here, and this might tell us if the problem is on the client side.Okay, I've tried enabling rbd caching now, and so far, the problem appears to be gone. I am using libvirt for starting and managing the virtual machines, and what I did was change the<source> element for the virtual disk from <source protocol='rbd' name='rbd/name_of_image'> to <source protocol='rbd' name='rbd/name_of_image:rbd_cache=true'> and then restart the VM. (I found that in one of your mails on this list; there does not appear to be any proper documentation on this...) The iotester does not find any corruptions with these settings. The VM ist still horribly broken, but that's probably lingering filesystem damage from yesterday. I'll try with a fresh image next. I did not change anything else in the setup. In particular, the OSDs still use btrfs. One of the OSD has been restarted, though. I will run another test with a VM without rbd caching, to make sure it wasn't by random chance restarting that one osd that made the real difference. Enabling btrfs did not appear to make any difference wrt performance, but that's probably because my tests mostly create sustained sequential IO, for which caches are generally not very helpful. Enabling rbd caching is not a solution I particularly like, for two reasons: 1. In my setup, migrating VMs from one host to another is a normal part of operation, and I still don't know ho to prevent data corruption (in the form of silently lost writes) when combining rbd caching and migration. 2. I'm not really looking into speeding up single VM, I'm really more interested in just how many VMs I can run before performance starts degrading for everyone, and I don't think rbd caching will help with that. Regards, Guido -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html-- Oliver Francke filoo GmbH Moltkestraße 25a 33330 Gütersloh HRB4355 AG Gütersloh Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
-- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html