From b218a20819932c4c594ea2933ff3f4f28b398325 Mon Sep 17 00:00:00 2001
From: Sree Harsha Totakura <totakura@in.tum.de>
Date: Fri, 13 Sep 2013 13:01:09 +0000
Subject: [PATCH] - doc

---
 src/testbed/barriers.README.org | 94 +++++++++++++++++++++------------
 1 file changed, 59 insertions(+), 35 deletions(-)

diff --git a/src/testbed/barriers.README.org b/src/testbed/barriers.README.org
index 4547009e2..ed39903c0 100644
--- a/src/testbed/barriers.README.org
+++ b/src/testbed/barriers.README.org
@@ -1,14 +1,15 @@
 * Description
-The barriers component of testbed facilitates coordination among the peers run
-by the testbed and the experiment driver.  The concept is similar to the barrier
+The testbed's barriers API facilitates coordination among the peers run by the
+testbed and the experiment driver.  The concept is similar to the barrier
 synchronisation mechanism found in parallel programming or multithreading
-paradigms - a peer waits at a barrier upon reaching it until it is crossed i.e,
-reached by a predefined number of peers.  This predefined number peers required
-to cross a barrier is also called quorum.
+paradigms - a peer waits at a barrier upon reaching it until the barrier is
+crossed i.e, the barrier is reached by a predefined number of peers.  This
+predefined number peers required to cross a barrier is also called quorum.  We
+say a peer has reached a barrier if the peer is waiting for the barrier to be
+crossed.  Similarly a barrier is said to be reached if the required quorum of
+peers reach the barrier.
 
-Coordination among the peers and the experiment driver is achieved through the
-barriers service and its respective barriers API.  The barriers API provides the
-following functions:
+The barriers API provides the following functions:
 
 1) barrier_init():  function to initialse a barrier in the experiment
 2) barrier_cancel(): function to cancel a barrier which has been initialised
@@ -20,28 +21,30 @@ following functions:
 Among the above functions, the first two, namely barrier_init() and
 barrier_cacel() are used by experiment drivers.  All barriers should be
 initialised by the experiment driver by calling barrier_init().  This function
-takes a name to identify the barrier and a notification callback for notifying
-the experiment driver when the barrier is crossed.  The function
-barrier_cancel() cancels an initialised barrier and frees the resources
-allocated for it.  This function can be called upon a initialised barrier before
-it is crossed.
+takes a name to identify the barrier, the quorum required for the barrier to be
+crossed and a notification callback for notifying the experiment driver when the
+barrier is crossed.  The function barrier_cancel() cancels an initialised
+barrier and frees the resources allocated for it.  This function can be called
+upon a initialised barrier before it is crossed.
 
 The remaining two functions barrier_wait() and barrier_wait_cancel() are used in
-the peer's processes.  barrier_wait() connects to the local barrier service and
-registers that the caller has reached the barrier and is waiting for the barrier
-to be crossed.  Note that this function can only be used by peers which are
-started by testbed as this function tries to access the local barrier service
-which is part of the testbed controller service.  Calling barrier_wait() on an
-uninitialised barrier (or not-yet-initialised) barrier results in failure.
-barrier_wait_cancel() cancels the notification registered by barrier_wait().
+the peer's processes.  barrier_wait() connects to the local barrier service
+running on the same host the peer is running on and registers that the caller
+has reached the barrier and is waiting for the barrier to be crossed.  Note that
+this function can only be used by peers which are started by testbed as this
+function tries to access the local barrier service which is part of the testbed
+controller service.  Calling barrier_wait() on an uninitialised barrier barrier
+results in failure.  barrier_wait_cancel() cancels the notification registered
+by barrier_wait().
 
 
 * Implementation
-Since barriers involve coordination between experiment driver and peers the
-barrier service is split into two components.  The first component responds to
-the barrier API used by the experiment driver (functions barrier_init() and
-barrier_cancel()) and the second component to the barrier API used by peers
-(functions barrier_wait() and barrier_wait_cancel())
+Since barriers involve coordination between experiment driver and peers, the
+barrier service in the testbed controller is split into two components.  The
+first component responds to the message generated by the barrier API used by the
+experiment driver (functions barrier_init() and barrier_cancel()) and the second
+component to the messages generated by barrier API used by peers (functions
+barrier_wait() and barrier_wait_cancel())
 
 Calling barrier_init() sends a BARRIER_INIT message to the master controller.
 The master controller then registers a barrier and calls barrier_init() for each
@@ -53,13 +56,34 @@ hierarchy back to the experiment driver.
 Similar to barrier_init(), barrier_cancel() propagates BARRIER_CANCEL message
 which causes controllers to remove an initialised barrier.
 
-The second component, according to gnunet architecture, is actually an another
-service but runs in the same binary `gnunet-service-testbed'; the reason is
-that it requires access to barrier data created by the first component.  This
-component responds to BARRIER_WAIT messages from local peers when they call
-barrier_wait().  Upon receiving BARRIER_WAIT message, the service checks if the
-requested barrier has been initialised before and it was not initialised the
-an error status is sent through BARRIER_STATUS message to the local peer and the
-connection from the peer is terminated.  If the barrier is initialised before,
-the barrier's counter for reached peers is incremented and a notification is
-registered to notify this peer when the barrier is reached.
+The second component is implemented as a separate service in the binary
+`gnunet-service-testbed' which already has the testbed controller service.
+Although this deviates from the gnunet process architecture of having one
+service per binary, it is needed in this case as this component needs access to
+barrier data created by the first component.  This component responds to
+BARRIER_WAIT messages from local peers when they call barrier_wait().  Upon
+receiving BARRIER_WAIT message, the service checks if the requested barrier has
+been initialised before and if it was not initialised, an error status is sent
+through BARRIER_STATUS message to the local peer and the connection from the
+peer is terminated.  If the barrier is initialised before, the barrier's counter
+for reached peers is incremented and a notification is registered to notify the
+peer when the barrier is reached.  The connection from the peer is left open.
+
+When enough peers required to attain the quorum send BARRIER_WAIT messages, the
+controller sends a BARRIER_STATUS message to its parent informing that the
+barrier is crossed.  If the controller has started further subcontrollers, it
+delays this message until it receives a notification from each of those
+subcontrollers that the barrier is crossed.  Finally, the barriers API at the
+experiment driver receives the BARRIER_STATUS when the barrier is reached at all
+the controllers.
+
+The barriers API at the experiment driver responds to the BARRIER_STATUS message
+by echoing it back to the master controller and notifying the experiment
+controller through the notification callback that a barrier has been crossed.
+The echoed BARRIER_STATUS message is propagated by the master controller to the
+controller hierarchy.  This progation triggers the notifications registered by
+peers at each of the controllers in the hierarchy.  Note the difference between
+this downward propagation of the BARRIER_STATUS message from its upward
+propagation -- the upward propagation is needed for ensuring that the barrier is
+reached by all the controllers and the downward propagation is for triggering
+that the barrier is crossed.
-- 
2.25.1