Re: Random data corruption in VM, possibly caused by rbd
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Am Freitag, 8. Juni 2012, 07:50:36 schrieb Josh Durgin:
> On 06/08/2012 06:55 AM, Sage Weil wrote:
> > On Fri, 8 Jun 2012, Oliver Francke wrote:
> >> Hi Guido,
> >>
> >> yeah, there is something weird going on. I just started to establish some
> >> test-VM's. Freshly imported from running *.qcow2 images.
> >> Kernel panic with INIT, seg-faults and other "funny" stuff.
> >>
> >> Just added the rbd_cache=true in my config, voila. All is
> >> fast-n-up-n-running...
> >> All my testing was done with cache enabled... Since our errors all came
> >> from rbd_writeback from former ceph-versions...
> >
> > Are you guys able to reproduce the corruption with 'debug osd = 20' and
> >
> > 'debug ms = 1'? Ideally we'd like to:
> > - reproduce from a fresh vm, with osd logs
> > - identify the bad file
> > - map that file to a block offset (see
> >
> > http://ceph.com/qa/fiemap.[ch], linux_fiemap.h)
> >
> > - use that to identify the badness in the log
> >
> > I suspect the cache is just masking the problem because it submits fewer
> > IOs...
>
> The cache also doesn't do sparse reads. Is it still reproducible with
> a fresh vm when you set filestore_fiemap_threshold = 0 for the osds,
> and run without rbd caching?
I have set filestore_fiemap_threshold = 0 on all osds and restarted them. The
problem is still there, and so bad I cannot even run this fiemap utility that
Sage posted. I guess I should have tried booting the VM from a livecd
instead...
Guido
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
[CEPH Users]
[Information on CEPH]
[Linux USB Devel]
[Video for Linux]
[Linux Audio Users]
[Photo]
[Yosemite News]
[Yosemite Photos]
[Free Online Dating]
[Linux Kernel]
[Linux SCSI]
[XFree86]