[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
  Web www.spinics.net

RE: Implementer's Guide - Task Management Issue

David and Mallikarjun,

I had a long discussion with Mallikarjun on a part of this problem - namely cleaning the T-2-I path.
This could be done in several ways and Mallikarjun and I where also playing with sending the closing TM response on all connections as a way to speed up pipe cleaning.

As for the issue raised by Bill Studemund I am not sure that the target needs help in recovering buffers (and I am not sure that I am not repeating what I said already in he past).
As TTT is a target conceived artifact - buffers can be retired and the target can refrain from reusing the TTT with the given ITTs for some time (the rules must be quite simple).
If data arrives with the bad combination - it is just discarded at the target.

This ways TMF can be sent early - regardless of oustanding data - provided that the target respects some simple rules for TTT use/reuse.
Considering also that TTTs are also not mandated to be unique beyond a single LUN - buffer retirement while an issue can be solved within 3270.



11/12/06 20:56

<cb_mallikarjun@xxxxxxxxx>, <ips@xxxxxxxx>
RE: Implementer's Guide - Task Management Issue


[NB: Working group chair hat is **off**.]

> "I assume this is essentially what you are proposing that we
> consider (fast multi-task abort with outstanding TTTs always,
> even without the key negotiation)."

Not exactly - comments interspersed below, but what I'm proposing
is that in the absence of negotiation of the new key, the portion
of "fast multi-task abort" that allows the TMF response to be
returned in the face of outstanding TTTs be allowed *only* for
transfers from initiators *other* than the one that sent the TMF.
I believe that Bill Studemund raised this concern earlier, but
I admit to missing its significance at the time.

In other words when the key is not negotiated, a TMF that aborts
tasks from multiple initiators behaves differently at the target
with respect to the initiators involved:
a) There can be no change to currently specified behavior with
                respect to the sender of the TMF.  All TTT transfers have
                to complete, and the TMF response cannot be sent until
                the transfers are complete, otherwise a 3720-compliant
                initiator may see something that it doesn't expect.
b) Transfers from other initiators may be bit-bucketed early at
                the target - this would be new behavior, and new language
                would be needed to allow the TMF response to be sent once
                these transfers from other initiators are redirected to
                bit-buckets.  This does not affect a 3720-compliant
                initiator, as these other initiators do not receive a
                response to this TMF.
The only change is the latter, and it has the effect of removing
a cross-initiator dependence for completion of the TMF.  The use
case is that there is still cluster software out there using TMFs
to maintain cluster membership instead of persistent reservations,
even though the latter is what should be used.

> Sorry for the delay in getting back.  Between vacation and
> other travel, it took me a while.  Thanks for the comments.
> You wrote this regarding fast multi-task abort:
> >This property is
> >useful even if the new key is not negotiated (and hence the
> >AsyncEvent 5 message is not used for fast abort of data transfers)
> I assume this is essentially what you are proposing that we
> consider (fast multi-task abort with outstanding TTTs always,
> even without the key negotiation).

Not exactly, see above.

> The reason we decided a new key is needed here was for two reasons:
> 1. Whenever TMF does a fast completion, target needs an
> (eventual) deterministic confirmation that data transfers had
> stopped.  The confirmation is Nop-Out, and the negotiation
> for the new Nop-Out is via the FastMultiTaskAbort key.
> 2. The initiator requirement in the "classic" case (i.e. key
> not negotiated) is that it respond to each TTT for affected
> tasks even while the task is "affected".  We wanted an
> opposite behavior, but with a confirmation that the data
> transfers had stopped (so target can recover the buffer
> resources).  The key allows this new behavior on initiator's
> part as well.

I agree that the new key is clearly required in order to
terminate any TTT data transfer from any initiator early
for the above reasons.

The proposal is that the TMF response be allowed to be sent
in the face of outstanding transfers from initiators *other*
than the one that sent the TMF.  This does not appear to
require a new key be negotiated with the *other* initiators,
as (in the absence of a failure) they will complete those
transfers normally.

> >This is approximately
> >what is described in the Implementation Note at the end of
> >Section 4.1.3, although that note may have been intended to
> >be iSER-specific - if so, this is a proposal to apply it to
> >iSCSI without the RDMA extensions.
> Actually the note is intended for all iSCSI implementations.  
> After seeing your observation, I decided that it needs
> improvement, I propose the following new text:
> "Implementation note: Technically, the TMF servicing is
> complete in Step.e.  Data transfers corresponding to
> terminated tasks may however still be in progress even at the
> end of Step.f.  TMF Response MUST NOT be sent by the target
> iSCSI layer before the end of Step.e, and may be sent at the
> end of Step.e despite these outstanding Data transfers until
> Step.g.

Nit: "may be sent" --> "MAY be sent"

> These data transfers, if any, MUST be silently
> discarded by the target iSCSI layer.  In the case of
> iSCSI/iSER, these transfers would be into tagged buffers with
> STags not owned by any active tasks.

I suggest adding: "; other implementations would deploy
analogous resources to support this discarding".

> Step.g specifies an
> event to free up the resources.  A target may, on an
> implementation-defined internal timeout, also choose to drop
> the connections on which it did not receive the expected
> Nop-Out acknowledgements so as to reclaim the associated
> buffer, STag and TTT resources as appropriate."

Nit: "A target may" --> "A target MAY"

> Now that I read the text after a long time, I spotted an
> unintended double negative in section 4.1.3, target behavior,
> bullet d-ii.  The text should read:
> "ii) each connection except the issuing connection of the
> issuing session that has at least one allegiant affected
> task."    (i.e. drop "non" from "non-issuing")


> The other thing that came to my mind after reading your note
> is that we don't currently have a generic key to capture the
> Response Fence behavior - although response fencing underlies
> both the fast multi-task abort as well as addressing ACA race
> conditions (and perhaps others down the road. e.g. around
> persistent reservations).  So, today, the Note at the end of
> section 3.3.3 advises that implementations may check the
> FastMultiTaskAbort key to verify if safe behavior for MCS ACA
> is supported, although ACA has really nothing to do with
> multi-task aborting.  I am wondering if we should create a
> new key (say ResponseFence), so the semantics would become:
>        ResponseFence    "Yes"  fencing done by target      
>                                    "No"   legacy, no fencing
> (so "clarified" TMF semantics are not possible either)
> With ResponseFence=    "Yes"
> FastMultiTaskAbort    
>       "Yes"                   fast abort & fencing              
>        "No"                    traditional wait on
> outstanding TTTs (fencing on ACA is still possible)
> With ResponseFence=    "No"
> FastMultiTaskAbort    
>       "Yes"                   Illegal, Response Fence must be "Yes"
>        "No"                    No fencing, must wait on
> outstanding TTTs
> The downside of this scheme is that it may be going in the
> opposite direction than you wanted (introduces a second key
> that 3720-compliant implementations don't know about).  We
> could alternatively simply mandate the behavior equivalent to
> ResponseFence = "Yes" always and avoid the second key, but
> doing so could make the current 3720-compliant
> implementations technically non-iSCSI-compliant.
> Comments?

Given the inter-dependence of ResponseFence and FastMultiTaskAbort,
a single 3-valued key is probably simpler than two boolean keys.
I think having an explicit means of determining whether ACA behaves
correctly on an multi-connection-session is worth adding.


> Mallikarjun                            
> ----- Original Message ----
> From: "Black_David@xxxxxxx" <Black_David@xxxxxxx>
> To: ips@xxxxxxxx
> Cc: Black_David@xxxxxxx
> Sent: Wednesday, November 22, 2006 2:00:25 PM
> Subject: Implementer's Guide - Task Management Issue
> To make sure we actually have some content to talk about in
> this WG Last Call, I'm going to reraise an issue that came
> up earlier on the mailing list, but (as far as I can recall)
> never got resolved.  This is done with my WG chair hat OFF,
> and it is a proposal for further discussion.
> Section 4.1.3 changes task management, and is a non-transparent
> change - it requires negotiating a new key so that both sides
> agree that they support the change as it uses a round-trip
> exchange of a new message (AsyncEvent 5) between initiator and
> target to abort in-progress data transfers rather than completing
> them.  Absent this message, the target expects the initiator(s)
> to complete all in-progress transfers, and is entitled to be
> unhappy or worse if that doesn't happen.
> For task management functions that affect tasks from more than
> RESET)  Section 4.1.3 also allows the task management function
> (TMF) to complete while the in-progress data transfers are still
> being dealt with, which has the useful effect of avoiding a
> situation in which an uncooperative initiator can stall the
> progress of a TMF sent by another initiator.  This property is
> useful even if the new key is not negotiated (and hence the
> AsyncEvent 5 message is not used for fast abort of data transfers)
> although I think the target behavior is subtly different between
> the initiator that sent the TMF and other initiators in this case:
> - For the TMF sender, the target must wait for all outstanding
>     transfers to complete before completing the TMF, otherwise
>     the TMF completion comes back too early for an unmodified
>     initiator.
> - For the other initiators, the data transfers can be immediately
>     redirected to bit buckets so the TMF can be completed without
>     waits beyond that for the TMF sender.  This is approximately
>     what is described in the Implementation Note at the end of
>     Section 4.1.3, although that note may have been intended to
>     be iSER-specific - if so, this is a proposal to apply it to
>     iSCSI without the RDMA extensions.
> High Availability clustering environments in which TMFs are being
> used to determine cluster membership (yes, there's code out there
> that does this, even though everyone should be using PERSISTENT
> RESERVE) are a specific situation where this helps, as having to
> wait for a dead initiator to expire (the TCP connection(s) have
> to timeout and get torn down) slows down cluster recovery from a
> failure.  This change in target behavior (to complete a TMF faster
> if other initiators don't cooperate) should be transparent to
> RFC 3720-compliant initiators, but RFC 3720 has to be modified
> in order to allow it; the Implementer's Guide is a vehicle that
> can make that modification.
> This is proposed for further discussion.
> Thanks,
> --David
> ----------------------------------------------------
> David L. Black, Senior Technologist
> EMC Corporation, 176 South St., Hopkinton, MA  01748
> +1 (508) 293-7953             FAX: +1 (508) 293-7786
> black_david@xxxxxxx        Mobile: +1 (978) 394-7754
> ----------------------------------------------------

Ips mailing list

Ips mailing list

[IETF]     [Linux iSCSI]     [Linux SCSI]     [Linux Resources]     [Yosemite News]     [IETF Announcements]     [IETF Discussion]     [SCSI]

Add to Google Powered by Linux