On Sep 13, 2004, at 9:23 PM, Mallikarjun C. wrote:
Comments below.
Caitlin Bestler wrote:
What is the responsibility for the distribution of flow control
between the Data Mover components and the iSCSI peers?
Datamover Architecture doesn't prescribe a specific distribution, it
only sets functionality expectations.
The position of the Architecture doc is that it's upto individual
Datamover protocols (e.g., iSER) to require a specific distribution.
Section 8 specifies:
1) guarantee that all the necessary data transfers take place
when the local iSCSI layer requests transmitting a command
(in order to complete a SCSI command, for an initiator),or
sending/receiving an iSCSI data sequence (in order to
complete part of a SCSI command, for a target).
This functional requirement can be met by a Datamover protocol in one
of a few ways:
1. Must break the underlying transport connection when this cannot be
met.
2. Must build in a positive flow control protocol internal to itself.
3. Must place additional requirements on iSCSI-3720's behavior (e.g.,
"shall only send a
max of 'n' immediate commands" etc.)
4. Must take the help of *its* LLPs (e.g., an iWARP stack for iSER) to
realize the same functionality.
DA does not have a prescription for iSER here.
After seriously evaluating #2, iSER chose a combination of #1 and #4
(#4: recall the lengthy discussion
on DDP text wrt "graceful handling" of empty RQ exceptions).
Note that despite its best intentions, a Datamover protocol is not
changing the iSCSI-3720's behavior -
e.g. too many immediate commands can be thrown away by iSCSI, without
notice.
I do not see an interoperability problem here for iSER, nor am
convinced that DA should say more
than it currently does in this regard.
For someone working above the Datamover the plain meaning of "guarantee
that all the necessary data transfers take place" is that the Datamover
will
reliably deliver the data in the absence of transport errors.
Option one is indeed an excellent remedy when the Datamover consumer
has sent more iSCSI PDUs than its peer can consume. But a Datamover
consumer may have gotten into the habit of assuming that the transport
layer will sweep pacing problems under the rug of its own pacing. TCP
applications frequently place more messages in flight than could be
instantaneously received on the other end, relying on transport layer
buffering to smooth out the problem.
But the primary wire protocol that the Datamover architecture was
designed to support, iSER, does not offer transport layer buffering
because RDMA is designed to eliminate transport layer buffering.
It should be made clear that a user of a Datamover service is responsible
for avoiding this problem. In my opinion the absence of enabling language
would actually forbid some of the solutions you listed.
Inviting traffic onto the network that you do not have buffers to handle is
bad enough. But solving it through network layer retransmission is clearly
an incorrect solution.
It also assumes that a Datamover is a *distributed* solution. What if
the protocol being implemented in a Datamover architecture was merely
a form of inter-process communication or use of shared memory?
Leaving the options open to each implementation is perfectly fine,
but only if those options are made clear to Datamover users. Since
Datamover is replacing carrying iSCSI PDUs over TCP then there
will be a natural tendency to assume that the solutions will match
those employed by TCP. Since those are not the optimal solutions
it is more likely that a buffer overrun will result in a connection
being dropped even when there was no network error.
That is actually acceptable, if it is a response to sending excessive
iSCSI PDUs. The fact that TCP would have been forgiving does not
make them legitimate somehow.
The fundamental fact is that the options I outlined *are* the only
options. TCP uses transport flow control and transport buffering,
and when necessary blocks the transmitter from submitting more
requests. Requiring Datamovers using very different protocols
to mimic that behavior is wrong -- but that is the plain meaning of
the requirements as currently stated.
It is not my intention to require iSER to add its own flow control.
I am trying to make certain that the flow control problem is not
shoved down to the RDMA layer and thereby prevent implementations
from truly eliminating TCP-caused bottlenecks. That requires
moving responsibility for flow control clearly back up to the
application layer. iSCSI is already 99% flow controlled,
clearly labeling that it is 100% responsible should not
be a problem.