We were discussing different setups of replication agreements (multi master) between a large number of hosts and ways to minimize contention during updates with interconnected hosts. For example the same change might arrive on a host from two other hosts via different paths at the same time causing "errors" in the log because of exponential back off. If you have to many connections you get a replication storm, to little connections and replication takes to long.

The problem to us sounds very much like a network problem or maybe the effectiveness of the underlying database to lock the data more effectively. 

We dreamt up a couple of solutions/ideas and I am writing this email to illicit some more discussions and/or comments. One solution would be to change the underlying database to one that supports improved granular locking (firebird comes to mind ) .

Another idea we discussed was based on the following question:
What if you could only define the list of master servers and let the master servers figure out the details with regards to doing multi mastering and distributing the data and taking care of broken paths? There is similarities with OSPF...

Do you have any thoughts on this? Have you had similar ideas? Are we missing the point?


