On Thu, Jan 12, 2012 at 12:31:09PM +1030, Rusty Russell wrote:
> On Wed, 11 Jan 2012 12:21:30 +0200, "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:
> > On Wed, Jan 11, 2012 at 10:55:52AM +1030, Rusty Russell wrote:
> > > On Tue, 10 Jan 2012 19:03:36 +0200, "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:
> > > > On Wed, Dec 21, 2011 at 11:03:25AM +1030, Rusty Russell wrote:
> > > > > Yes. The idea that we can alter fields in the device-specific config
> > > > > area is flawed. There may be cases where it doesn't matter, but as an
> > > > > idea it was holed to begin with.
> > > > >
> > > > > We can reduce probability by doing a double read to check, but there are
> > > > > still cases where it will fail.
> > > >
> > > > Okay - want me to propose an interface for that?
> > >
> > > Had a brief chat with BenH (CC'd).
> > >
> > > I think we should deprecate writing to the config space. Only balloon
> > > does it AFAICT, and I can't quite figure out *why* it has an 'active'
> > > field.
> >
> > Are you sure? I think net writes a mac address.
>
> True. We'll need to disable that, and come up with another mechanism if
> we want it back (a new feature and a VIRTIO_NET_HDR_F_SET_MAC flag in
> the virtio_net header? Or would that mess up vhost_net?).
vhost net is only datapath, so no problem. copying in host is tricky for
vhost_net.
> > > This solves half the problem, of sync guest writes. For the
> > > other half, I suggest a generation counter; odd means inconsistent. The
> > > guest can poll.
> >
> > So we get the counter until it's even, get the config, if it's changed
> > repeat? Yes it works. However, I would like to have a way to detect
> > config change just by looking at memory. ATM we need to read ISR to
> > know. If we used a VQ, the advantage would be that the device can work
> > with a single MSIX vector shared by all VQs.
>
> If we use a 32-bit counter, we also get this though, right?
>
> If counter has changed, it's a config interrupt...
But we need an exit to read the counter. We can put the counter
in memory but this looks suspiciously like a simplified VQ -
so why not use a VQ then?
> > If we do require config VQ anyway, why not use it to notify
> > guest of config changes? Guest could pre-post an in buffer
> > and host uses that.
>
> We could, but it's an additional burden on each device. vqs are cheap,
> but not free. And the config area is so damn convenient...
Not if you start playing with counters, checking it twice,
reinvent all kind of barriers ...
> > > BenH also convinced me we should finally make the config space LE if
> > > we're going to change things. Since PCI is the most common transport,
> > > guest-endian confuses people. And it sucks for really weird machines.
> >
> > Are we going to keep guest endian for e.g. virtio net header?
> > If yes the benefit of switching config space is not that big.
> > And changes in devices would affect non-PCI transports.
>
> Yep. It would only make sense if we do it for everything. And yes,
> it'll mess up everyone who is BE, so it needs to be a feature bit for
> them.
>
> > > We should also change the ring (to a single ring, I think). Descriptors
> > > to 24 bytes long (8 byte cookie, 8 byte addr, 4 byte len, 4 byte flags).
> > > We might be able to squeeze it into 20 bytes but that means packing. We
> > > should support inline, chained or indirect. Let the other side ack by
> > > setting flag, cookie and len (if written).
> >
> > Quite possibly all or some of these things help performance
> > but do we have to change the spec before we have experimental
> > proof?
>
> We change the spec last, once we know what we're doing, ideally.
>
> > I did experiment with a single ring using tools/virtio and
> > I didn't see a measureable performance gain.
>
> Interesting. It is simpler and more standard than our current design,
> but that's not sufficient unless there are other reasons. Needs further
> discussion and testing.
>
> > Two rings do have the advantage of not requiring host side copy, which
> > copy would surely add to cache pressure.
>
> Well, a simple host could process in-order and leave stuff in the ring I
> guess. A smarter host would copy and queue, maybe leave one queue entry
> in so it doesn't get flooded?
>
> > Since
> > host doesn't change desriptors, we could also
> > preformat some descriptors in the current design.
> >
> > There is a fragmentation problem in theory but it can be alleviated with
> > a smart allocator.
>
> Yeah, the complexity scares me...
>
> > About inline - it can only help very small buffers.
> > Which workloads do you have in mind exactly?
>
> It was suggested by others, but I think TCP Acks are the classic one.
> 12 + 14 + 20 + 40 = 86 bytes with virtio_net_hdr_mrg_rxbuf at the front.
That's only the simplest IPv4, right?
Anyway, this spans multiple descriptors so this complicates allocation
significantly.
> > BTW this seems to be the reverse from what you have in Mar 2001,
> > see 87mxkjls61.fsf@xxxxxxxxxxxxxxx :)
>
> (s/2001/2011). Indeed. Noone shared my optimism that having an open
> process for a virtio2 would bring more players on board (my original
> motivation). But technical requirements are mounting up, which means
> we're going to get there anyway.
>
> > I am much less concerned with what we do for configuration,
> > but I do not believe we have learned all performance lessons
> > from virtio ring1. Is there any reason why we shouldn't be
> > able to experiment with inline within virtio1 and see
> > whether that gets us anything?
>
> Inline in the used ring is possible, but those descriptors 8 bytes, vs
> 24/32.
Hmm, 86 > 32 anyway, right?
> > If we do a bunch of changes to the ring at once, we can't
> > figure out what's right, what's wrong, or back out of
> > mistakes later.
> >
> > Since there are non PCI transports that use the ring,
> > we really shouldn't make both the configuration and
> > the ring changes depend on the same feature bit.
>
> Yes, I'm thinking #define VIRTIO_F_VIRTIO2 (-1). For PCI, this gets
> mapped into a "are we using the new config layout?". For others, it
> gets mapped into a transport-specific feature.
>
> (I'm sure you get it, but for the others) This is because I want to be
> draw a clear line between all the legacy stuff at the same time, not
> have to support part of it later because someone might not flip the
> feature bit.
>
> Cheers,
> Rusty.
So my point is, config stuff and ring are completely separate,
they are different layers.
--
MST
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
- References:
- Re: [RFC 7/11] virtio_pci: new, capability-aware driver.
- Re: [RFC 7/11] virtio_pci: new, capability-aware driver.
- Re: [RFC 7/11] virtio_pci: new, capability-aware driver.
- Re: [RFC 7/11] virtio_pci: new, capability-aware driver.
- Re: [RFC 7/11] virtio_pci: new, capability-aware driver.
- Re: [RFC 7/11] virtio_pci: new, capability-aware driver.
- Re: [RFC 7/11] virtio_pci: new, capability-aware driver.
- Re: [RFC 7/11] virtio_pci: new, capability-aware driver.
- Re: [RFC 7/11] virtio_pci: new, capability-aware driver.
- Re: [RFC 7/11] virtio_pci: new, capability-aware driver.
[KVM Development]
[CentOS Virtualization]
[Netdev]
[Ethernet Bridging]
[Linux Wireless]
[Kernel Newbies]
[Security]
[Linux for Hams]
[Netfilter]
[Bugtraq]
[Photo]
[Yosemite]
[Yosemite Forum]
[MIPS Linux]
[ARM Linux]
[Linux RAID]
[Linux Admin]
[Samba]
[Find Someone Nice]
[Video 4 Linux]
[Linux Resources]