Re: [PATCH] corosync to start in infiniband + redundant ring active/passive mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 21.11.2012 13:29, Jan Friesse wrote:
Evgeny Barskiy napsal(a):
Sorry for the incomplete previous message

On 21.11.2012 12:44, Evgeny Barskiy wrote:
On 20.11.2012 18:22, Jan Friesse wrote:
Evgeny Barskiy napsal(a):
Corosync now works with infiniband transport in any redundant ring mode

Signed-off-by: Evgeny Barskiy<barskiy@xxxxxx>
---
   exec/totemiba.c |   10 +++++++++-
   1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/exec/totemiba.c b/exec/totemiba.c
index 189eb00..5d47d6b 100644
--- a/exec/totemiba.c
+++ b/exec/totemiba.c
@@ -536,6 +536,7 @@ static int mcast_rdma_event_fn (int events,  int
suck,  void *context)
        */
       case RDMA_CM_EVENT_ADDR_RESOLVED:
           rdma_join_multicast (instance->mcast_cma_id,
&instance->mcast_addr, instance);
+        usleep(1000);
what is this usleep good for?
This one helps rings to be initialized in correct order. In case we
receive RDMA_CM_EVENT_MULTICAST_JOIN message for second ring before the
same message for the first one we will fail on assert:
corosync: totemsrp.c:3236: memb_ring_id_create_or_load: Assertion
`!totemip_zero_check(&memb_ring_id->rep)' failed.
main_iface_change_fn should be firstly called for the first ring since
function memb_ring_id_create_or_load uses ring_id which is filled only
during function main_iface_change_fn initializes  first ring
Thanks for explanation, this looks reasonable (even I don't like solution).

I'm ACKing this patch (already committed), but do you think it would be
possible to rather wait for first interface to be ready and then init
second one?

Because (example) what if first fails to initialize? Then usleep doesn't
help.

Exactly, actually I think we have two separated problems here:

1) To allow rings to be initialized in arbitrary order regardless transport type

This means probably we can fix totemsrp.c :main_iface_change_fn, moving most of its functionality inside last if statement, just before entering to the gather state

2) Solving various initializing problem in case of infiniband transport

Currently, we will fail in any of the following problems (both rrp and non-rrp mode):

1. interface isnt up (check it in timer_function_netif_check_timeout)
2. route wasnt resolved (do smth with RDMA_CM_EVENT_ROUTE_ERROR message in mcast_rdma_event_fn) 3. fail to join to multicast group (do smth with RDMA_CM_EVENT_MULTICAST_ERROR message in mcast_rdma_event_fn)

In any of these cases we will never call main_iface_change_fn for this ring, (however in UDP version we just mark ring failed and call it anyway), so we will never enter gather state

---
Also, I think there is probably even more serious problem (non relative with above), what if our infiniband subnet manager downed by any reason? Yeap this way other SM on the other blade will wake up, as I understand from mellanox programing manual, we will receive IBV_EVENT_SM_CHANGE message and have to reregister multicast group etc...


Thanks,
   Honza

           break;
       /*
        * occurs when the CM joins the multicast group
@@ -1029,6 +1030,12 @@ static int send_token_unbind (struct
totemiba_instance *instance)
           instance->totemiba_poll_handle,
           instance->send_token_channel->fd);
   +    if(instance->send_token_ah)
+    {
+        ibv_destroy_ah(instance->send_token_ah);
+        instance->send_token_ah = 0;
+    }
+
       rdma_destroy_qp (instance->send_token_cma_id);
       ibv_destroy_cq (instance->send_token_send_cq);
       ibv_destroy_cq (instance->send_token_recv_cq);
@@ -1417,7 +1424,8 @@ int totemiba_token_send (
       sge.lkey = send_buf->mr->lkey;
       sge.addr = (uintptr_t)msg;
   -    res = ibv_post_send (instance->send_token_cma_id->qp,
&send_wr, &failed_send_wr);
+    if(instance->send_token_ah != 0 && instance->send_token_bound)
+        res = ibv_post_send (instance->send_token_cma_id->qp,
&send_wr, &failed_send_wr);
         return (res);
   }



_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss


[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux