From b218a20819932c4c594ea2933ff3f4f28b398325 Mon Sep 17 00:00:00 2001 From: Sree Harsha Totakura Date: Fri, 13 Sep 2013 13:01:09 +0000 Subject: [PATCH] - doc --- src/testbed/barriers.README.org | 94 +++++++++++++++++++++------------ 1 file changed, 59 insertions(+), 35 deletions(-) diff --git a/src/testbed/barriers.README.org b/src/testbed/barriers.README.org index 4547009e2..ed39903c0 100644 --- a/src/testbed/barriers.README.org +++ b/src/testbed/barriers.README.org @@ -1,14 +1,15 @@ * Description -The barriers component of testbed facilitates coordination among the peers run -by the testbed and the experiment driver. The concept is similar to the barrier +The testbed's barriers API facilitates coordination among the peers run by the +testbed and the experiment driver. The concept is similar to the barrier synchronisation mechanism found in parallel programming or multithreading -paradigms - a peer waits at a barrier upon reaching it until it is crossed i.e, -reached by a predefined number of peers. This predefined number peers required -to cross a barrier is also called quorum. +paradigms - a peer waits at a barrier upon reaching it until the barrier is +crossed i.e, the barrier is reached by a predefined number of peers. This +predefined number peers required to cross a barrier is also called quorum. We +say a peer has reached a barrier if the peer is waiting for the barrier to be +crossed. Similarly a barrier is said to be reached if the required quorum of +peers reach the barrier. -Coordination among the peers and the experiment driver is achieved through the -barriers service and its respective barriers API. The barriers API provides the -following functions: +The barriers API provides the following functions: 1) barrier_init(): function to initialse a barrier in the experiment 2) barrier_cancel(): function to cancel a barrier which has been initialised @@ -20,28 +21,30 @@ following functions: Among the above functions, the first two, namely barrier_init() and barrier_cacel() are used by experiment drivers. All barriers should be initialised by the experiment driver by calling barrier_init(). This function -takes a name to identify the barrier and a notification callback for notifying -the experiment driver when the barrier is crossed. The function -barrier_cancel() cancels an initialised barrier and frees the resources -allocated for it. This function can be called upon a initialised barrier before -it is crossed. +takes a name to identify the barrier, the quorum required for the barrier to be +crossed and a notification callback for notifying the experiment driver when the +barrier is crossed. The function barrier_cancel() cancels an initialised +barrier and frees the resources allocated for it. This function can be called +upon a initialised barrier before it is crossed. The remaining two functions barrier_wait() and barrier_wait_cancel() are used in -the peer's processes. barrier_wait() connects to the local barrier service and -registers that the caller has reached the barrier and is waiting for the barrier -to be crossed. Note that this function can only be used by peers which are -started by testbed as this function tries to access the local barrier service -which is part of the testbed controller service. Calling barrier_wait() on an -uninitialised barrier (or not-yet-initialised) barrier results in failure. -barrier_wait_cancel() cancels the notification registered by barrier_wait(). +the peer's processes. barrier_wait() connects to the local barrier service +running on the same host the peer is running on and registers that the caller +has reached the barrier and is waiting for the barrier to be crossed. Note that +this function can only be used by peers which are started by testbed as this +function tries to access the local barrier service which is part of the testbed +controller service. Calling barrier_wait() on an uninitialised barrier barrier +results in failure. barrier_wait_cancel() cancels the notification registered +by barrier_wait(). * Implementation -Since barriers involve coordination between experiment driver and peers the -barrier service is split into two components. The first component responds to -the barrier API used by the experiment driver (functions barrier_init() and -barrier_cancel()) and the second component to the barrier API used by peers -(functions barrier_wait() and barrier_wait_cancel()) +Since barriers involve coordination between experiment driver and peers, the +barrier service in the testbed controller is split into two components. The +first component responds to the message generated by the barrier API used by the +experiment driver (functions barrier_init() and barrier_cancel()) and the second +component to the messages generated by barrier API used by peers (functions +barrier_wait() and barrier_wait_cancel()) Calling barrier_init() sends a BARRIER_INIT message to the master controller. The master controller then registers a barrier and calls barrier_init() for each @@ -53,13 +56,34 @@ hierarchy back to the experiment driver. Similar to barrier_init(), barrier_cancel() propagates BARRIER_CANCEL message which causes controllers to remove an initialised barrier. -The second component, according to gnunet architecture, is actually an another -service but runs in the same binary `gnunet-service-testbed'; the reason is -that it requires access to barrier data created by the first component. This -component responds to BARRIER_WAIT messages from local peers when they call -barrier_wait(). Upon receiving BARRIER_WAIT message, the service checks if the -requested barrier has been initialised before and it was not initialised the -an error status is sent through BARRIER_STATUS message to the local peer and the -connection from the peer is terminated. If the barrier is initialised before, -the barrier's counter for reached peers is incremented and a notification is -registered to notify this peer when the barrier is reached. +The second component is implemented as a separate service in the binary +`gnunet-service-testbed' which already has the testbed controller service. +Although this deviates from the gnunet process architecture of having one +service per binary, it is needed in this case as this component needs access to +barrier data created by the first component. This component responds to +BARRIER_WAIT messages from local peers when they call barrier_wait(). Upon +receiving BARRIER_WAIT message, the service checks if the requested barrier has +been initialised before and if it was not initialised, an error status is sent +through BARRIER_STATUS message to the local peer and the connection from the +peer is terminated. If the barrier is initialised before, the barrier's counter +for reached peers is incremented and a notification is registered to notify the +peer when the barrier is reached. The connection from the peer is left open. + +When enough peers required to attain the quorum send BARRIER_WAIT messages, the +controller sends a BARRIER_STATUS message to its parent informing that the +barrier is crossed. If the controller has started further subcontrollers, it +delays this message until it receives a notification from each of those +subcontrollers that the barrier is crossed. Finally, the barriers API at the +experiment driver receives the BARRIER_STATUS when the barrier is reached at all +the controllers. + +The barriers API at the experiment driver responds to the BARRIER_STATUS message +by echoing it back to the master controller and notifying the experiment +controller through the notification callback that a barrier has been crossed. +The echoed BARRIER_STATUS message is propagated by the master controller to the +controller hierarchy. This progation triggers the notifications registered by +peers at each of the controllers in the hierarchy. Note the difference between +this downward propagation of the BARRIER_STATUS message from its upward +propagation -- the upward propagation is needed for ensuring that the barrier is +reached by all the controllers and the downward propagation is for triggering +that the barrier is crossed. -- 2.25.1