Re: [RFC 7/11] virtio_pci: new, capability-aware driver.
|[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]|
On Thu, Jan 12, 2012 at 12:31:09PM +1030, Rusty Russell wrote: > On Wed, 11 Jan 2012 12:21:30 +0200, "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote: > > On Wed, Jan 11, 2012 at 10:55:52AM +1030, Rusty Russell wrote: > > > On Tue, 10 Jan 2012 19:03:36 +0200, "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote: > > > > On Wed, Dec 21, 2011 at 11:03:25AM +1030, Rusty Russell wrote: > > > > > Yes. The idea that we can alter fields in the device-specific config > > > > > area is flawed. There may be cases where it doesn't matter, but as an > > > > > idea it was holed to begin with. > > > > > > > > > > We can reduce probability by doing a double read to check, but there are > > > > > still cases where it will fail. > > > > > > > > Okay - want me to propose an interface for that? > > > > > > Had a brief chat with BenH (CC'd). > > > > > > I think we should deprecate writing to the config space. Only balloon > > > does it AFAICT, and I can't quite figure out *why* it has an 'active' > > > field. > > > > Are you sure? I think net writes a mac address. > > True. We'll need to disable that, and come up with another mechanism if > we want it back (a new feature and a VIRTIO_NET_HDR_F_SET_MAC flag in > the virtio_net header? Or would that mess up vhost_net?). vhost net is only datapath, so no problem. copying in host is tricky for vhost_net. > > > This solves half the problem, of sync guest writes. For the > > > other half, I suggest a generation counter; odd means inconsistent. The > > > guest can poll. > > > > So we get the counter until it's even, get the config, if it's changed > > repeat? Yes it works. However, I would like to have a way to detect > > config change just by looking at memory. ATM we need to read ISR to > > know. If we used a VQ, the advantage would be that the device can work > > with a single MSIX vector shared by all VQs. > > If we use a 32-bit counter, we also get this though, right? > > If counter has changed, it's a config interrupt... But we need an exit to read the counter. We can put the counter in memory but this looks suspiciously like a simplified VQ - so why not use a VQ then? > > If we do require config VQ anyway, why not use it to notify > > guest of config changes? Guest could pre-post an in buffer > > and host uses that. > > We could, but it's an additional burden on each device. vqs are cheap, > but not free. And the config area is so damn convenient... Not if you start playing with counters, checking it twice, reinvent all kind of barriers ... > > > BenH also convinced me we should finally make the config space LE if > > > we're going to change things. Since PCI is the most common transport, > > > guest-endian confuses people. And it sucks for really weird machines. > > > > Are we going to keep guest endian for e.g. virtio net header? > > If yes the benefit of switching config space is not that big. > > And changes in devices would affect non-PCI transports. > > Yep. It would only make sense if we do it for everything. And yes, > it'll mess up everyone who is BE, so it needs to be a feature bit for > them. > > > > We should also change the ring (to a single ring, I think). Descriptors > > > to 24 bytes long (8 byte cookie, 8 byte addr, 4 byte len, 4 byte flags). > > > We might be able to squeeze it into 20 bytes but that means packing. We > > > should support inline, chained or indirect. Let the other side ack by > > > setting flag, cookie and len (if written). > > > > Quite possibly all or some of these things help performance > > but do we have to change the spec before we have experimental > > proof? > > We change the spec last, once we know what we're doing, ideally. > > > I did experiment with a single ring using tools/virtio and > > I didn't see a measureable performance gain. > > Interesting. It is simpler and more standard than our current design, > but that's not sufficient unless there are other reasons. Needs further > discussion and testing. > > > Two rings do have the advantage of not requiring host side copy, which > > copy would surely add to cache pressure. > > Well, a simple host could process in-order and leave stuff in the ring I > guess. A smarter host would copy and queue, maybe leave one queue entry > in so it doesn't get flooded? > > > Since > > host doesn't change desriptors, we could also > > preformat some descriptors in the current design. > > > > There is a fragmentation problem in theory but it can be alleviated with > > a smart allocator. > > Yeah, the complexity scares me... > > > About inline - it can only help very small buffers. > > Which workloads do you have in mind exactly? > > It was suggested by others, but I think TCP Acks are the classic one. > 12 + 14 + 20 + 40 = 86 bytes with virtio_net_hdr_mrg_rxbuf at the front. That's only the simplest IPv4, right? Anyway, this spans multiple descriptors so this complicates allocation significantly. > > BTW this seems to be the reverse from what you have in Mar 2001, > > see 87mxkjls61.fsf@xxxxxxxxxxxxxxx :) > > (s/2001/2011). Indeed. Noone shared my optimism that having an open > process for a virtio2 would bring more players on board (my original > motivation). But technical requirements are mounting up, which means > we're going to get there anyway. > > > I am much less concerned with what we do for configuration, > > but I do not believe we have learned all performance lessons > > from virtio ring1. Is there any reason why we shouldn't be > > able to experiment with inline within virtio1 and see > > whether that gets us anything? > > Inline in the used ring is possible, but those descriptors 8 bytes, vs > 24/32. Hmm, 86 > 32 anyway, right? > > If we do a bunch of changes to the ring at once, we can't > > figure out what's right, what's wrong, or back out of > > mistakes later. > > > > Since there are non PCI transports that use the ring, > > we really shouldn't make both the configuration and > > the ring changes depend on the same feature bit. > > Yes, I'm thinking #define VIRTIO_F_VIRTIO2 (-1). For PCI, this gets > mapped into a "are we using the new config layout?". For others, it > gets mapped into a transport-specific feature. > > (I'm sure you get it, but for the others) This is because I want to be > draw a clear line between all the legacy stuff at the same time, not > have to support part of it later because someone might not flip the > feature bit. > > Cheers, > Rusty. So my point is, config stuff and ring are completely separate, they are different layers. -- MST _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization