From: Christian Grothoff Date: Sat, 19 Nov 2011 20:26:33 +0000 (+0000) Subject: moved to drupal X-Git-Tag: initial-import-from-subversion-38251~15869 X-Git-Url: https://git.librecmc.org/?a=commitdiff_plain;h=461cf105fd0e7bf0eb7b7ae0481b327db44e950c;p=oweals%2Fgnunet.git moved to drupal --- diff --git a/RATIONALE b/RATIONALE deleted file mode 100644 index 1851aeb78..000000000 --- a/RATIONALE +++ /dev/null @@ -1,316 +0,0 @@ -This document is a summary of the changes made to GNUnet for version -0.9.x (from 0.8.x) and what this major redesign tries to address. - -First of all, the redesign does not (intentionally) change anything -fundamental about the application-level protocols or how files are -encoded and shared. However, it is not protocol-compatible due to -other changes that do not relate to the essence of the application -protocols. This choice was made since productive development and -readable code were considered more important than compatibility at -this point. - -The redesign tries to address the following major problem groups -describing isssues that apply more or less to all GNUnet versions -prior to 0.9.x: - - -PROBLEM GROUP 1 (scalability): -* The code was modular, but bugs were not. Memory corruption - in one plugin could cause crashes in others and it was not - always easy to identify the culprit. This approach - fundamentally does not scale (in the sense of GNUnet being - a framework and a GNUnet server running hundreds of - different application protocols -- and the result still - being debuggable, secure and stable). -* The code was heavily multi-threaded resulting in complex - locking operations. GNUnet 0.8.x had over 70 different - mutexes and almost 1000 lines of lock/unlock operations. - It is challenging for even good programmers to program or - maintain good multi-threaded code with this complexity. - The excessive locking essentially prevents GNUnet 0.8 from - actually doing much in parallel on multicores. -* Despite efforts like Freeway, it was virtually - impossible to contribute code to GNUnet 0.8 that was not - writen in C/C++. -* Changes to the configuration almost always required restarts - of gnunetd; the existence of change-notifications does not - really change that (how many users are even aware of SIGHUP, - and how few options worked with that -- and at what expense - in code complexity!). -* Valgrinding could only be done for the entire gnunetd - process. Given that gnunetd does quite a bit of - CPU-intensive crypto, this could not be done for a system - under heavy (or even moderate) load. -* Stack overflows with threads, while rare under Linux these - days, result in really nasty and hard-to-find crashes. -* structs of function pointers in service APIs were - needlessly adding complexity, especially since in - most cases there was no actual polymorphism - -SOLUTION: -* Use multiple, lously-coupled processes and one big select - loop in each (supported by a powerful util library to eliminate - code duplication for each process). -* Eliminate all threads, manage the processes with a - master-process (gnunet-arm, for automatic restart manager) - which also ensures that configuration changes trigger the - necessary restarts. -* Use continuations (with timeouts) as a way to unify - cron-jobs and other event-based code (such as waiting - on network IO). - => Using multiple processes ensures that memory corruption - stays localized. - => Using multiple processes will make it easy to contribute - services written in other language(s). - => Individual services can now be subjected to valgrind - => Process priorities can be used to schedule the CPU better - Note that we can not just use one process with a big - select loop because we have blocking operations (and the - blocking is outside of our control, thanks to MySQL, - sqlite, gethostbyaddr, etc.). So in order to perform - reasonably well, we need some construct for parallel - execution. - - RULE: If your service contains blocking functions, it - MUST be a process by itself. If your service - is sufficiently complex, you MAY choose to make - it a separate process. -* Eliminate structs with function pointers for service APIs; - instead, provide a library (still ending in _service.h) API - that transmits the requests nicely to the respective - process (easier to use, no need to "request" service - in the first place; API can cause process to be started/stopped - via ARM if necessary). - - -PROBLEM GROUP 2 (UTIL-APIs causing bugs): -* The existing logging functions were awkward to use and - their expressive power was never really used for much. -* While we had some rules for naming functions, there - were still plenty of inconsistencies. -* Specification of default values in configuration could - result in inconsistencies between defaults in - config.scm and defaults used by the program; also, - different defaults might have been specified for the - same option in different parts of the program. -* The TIME API did not distinguish between absolute - and relative time, requiring users to know which - type of value some variable contained and to - manually convert properly. Combined with the - possibility of integer overflows this is a major - source of bugs. -* The TIME API for seconds has a theoretical problem - with a 32-bit overflow on some platforms which is - only partially fixed by the old code with some - hackery. - -SOLUTION: -* Logging was radically simplified. -* Functions are now more conistently named. -* Configuration has no more defaults; instead, - we load a global default configuration file - before the user-specific configuration (which - can be used to override defaults); the global - default configuration file will be generated - from config.scm. -* Time now distinguishes between - struct GNUNET_TIME_Absolute and - struct GNUNET_TIME_Relative. We use structs - so that the compiler won't coerce for us - (forcing the use of specific conversion - functions which have checks for overflows, etc.). - Naturally the need to use these functions makes - the code a bit more verbose, but that's a good - thing given the potential for bugs. -* There is no more TIME API function to do anything - with 32-bit seconds -* There is now a bandwidth API to handle - non-trivial bandwidth utilization calculations - - -PROBLEM GROUP 3 (statistics): -* Databases and others needed to store capacity values - similar to what stats was already doing, but - across process lifetimes ("state"-API was a partial - solution for that, but using it was clunky) -* Only gnunetd could use statistics, but other - processes in the GNUnet system might have had - good uses for it as well - -SOLUTION: -* New statistics library and service that offer - an API to inspect and modify statistics -* Statistics are distinguished by service name - in addition to the name of the value -* Statistics can be marked as persistent, in - which case they are written to disk when - the statistics service shuts down. - => One solution for existing stats uses, - application stats, database stats and - versioning information! - - -PROBLEM GROUP 4 (Testing): -* The existing structure of the code with modules - stored in places far away from the test code - resulted in tools like lcov not giving good results. -* The codebase had evolved into a complex, deeply - nested hierarchy often with directories that - then only contained a single file. Some of these - files had the same name making it hard to find - the source corresponding to a crash based on - the reported filename/line information. -* Non-trivial portions of the code lacked good testcases, - and it was not always obvious which parts of the code - were not well-tested. - -SOLUTION: -* Code that should be tested together is now - in the same directory. -* The hierarchy is now essentially flat, each - major service having on directory under src/; - naming conventions help to make sure that - files have globally-unique names -* All code added to the new repository must - come with testcases with reasonable coverage. - - -PROBLEM GROUP 5 (core/transports): -* The new DV service requires session key exchange - between DV-neighbours, but the existing - session key code can not be used to achieve this. -* The core requires certain services - (such as identity, pingpong, fragmentation, - transport, traffic, session) which makes it - meaningless to have these as modules - (especially since there is really only one - way to implement these) -* HELLO's are larger than necessary since we need - one for each transport (and hence often have - to pick a subset of our HELLOs to transmit) -* Fragmentation is done at the core level but only - required for a few transports; future versions of - these transports might want to be aware of fragments - and do things like retransmission -* Autoconfiguration is hard since we have no good - way to detect (and then use securely) our external IP address -* It is currently not possible for multiple transports - between the same pair of peers to be used concurrently - in the same direction(s) -* We're using lots of cron-based jobs to periodically - try (and fail) to build and transmit - -SOLUTION: -* Rewrite core to integrate most of these services - into one "core" service. -* Redesign HELLO to contain the addresses for - all enabled transports in one message (avoiding - having to transmit the public key and signature - many, many times) -* With discovery being part of the transport service, - it is now also possible to "learn" our external - IP address from other peers (we just add plausible - addresses to the list; other peers will discard - those addresses that don't work for them!) -* New DV will consist of a "transport" and a - high-level service (to handle encrypted DV - control- and data-messages). -* Move expiration from one field per HELLO to one - per address -* Require signature in PONG, not in HELLO (and confirm - on address at a time) -* Move fragmentation into helper library linked - against by UDP (and others that might need it) -* Link-to-link advertising of our HELLO is transport - responsibility; global advertising/bootstrap remains - responsibility of higher layers -* Change APIs to be event-based (transports pull for - transmission data instead of core pushing and failing) - - -PROBLEM GROUP 6 (FS-APIs): -* As with gnunetd, the FS-APIs are heavily threaded, - resulting in hard-to-understand code (slightly - better than gnunetd, but not much). -* GTK in particular does not like this, resulting - in complicated code to switch to the GTK event - thread when needed (which may still be causing - problems on Gnome, not sure). -* If GUIs die (or are not properly shutdown), state - of current transactions is lost (FSUI only - saves to disk on shutdown) -* FILENAME metadata is killed by ECRS/FSUI to avoid - exposing HOME, but what if the user set it manually? -* The DHT was a generic data structure with no - support for ECRS-style block validation - -SOLUTION: -* Eliminate threads from FS-APIs -* Incrementally store FS-state always also on disk using many - small files instead of one big file -* Have API to manipulate sharing tree before - upload; have auto-construction modify FILENAME - but allow user-modifications afterwards -* DHT API was extended with a BLOCK API for content - validation by block type; validators for FS and - DHT block types were written; BLOCK API is also - used by gap routing code. - - -PROBLEM GROUP 7 (User experience): -* Searches often do not return a sufficient / significant number of - results -* Sharing a directory with thousands of similar files (image/jpeg) - creates thousands of search results for the mime-type keyword - (problem with DB performance, network transmission, caching, - end-user display, etc.) -* Users that wanted to share important content had no way to - tell the system to replicate it more; replication was also - inefficient (this desired feature was sometimes called - "power" publishing or content pushing) - -SOLUTION: -* Have option to canonicalize keywords (see suggestion on mailinglist end of - June 2009: keep consonants and sort those alphabetically); not - fully implemented yet -* When sharing directories, extract keywords first and then - push keywords that are common in all files up to the - directory level; when processing an AND-ed query and a directory - is found to match the result, do an inspection on the metadata - of the files in the directory to possibly produce further results - (requires downloading of the directory in the background); - needs more testing -* A desired replication level can now be specified and is tracked - in the datastore; migration prefers content with a high - replication level (which decreases as replicase are created) - => datastore format changed; we also took out a size field - that was redundant, so the overall overhead remains the same -* Peers with a full disk (or disabled migration) can now notify - other peers that they are not interested in migration right - now; as a result, less bandwidth is wasted pushing content - to these peers (and replication counters are not generally - decreased based on copies that are just discarded; naturally, - there is still no guarantee that the replicas will stay - available) - - - -SUMMARY: -* Features eliminated from util: - - threading (goal: good riddance!) - - complex logging features [ectx-passing, target-kinds] (goal: good riddance!) - - complex configuration features [defaults, notifications] (goal: good riddance!) - - network traffic monitors (goal: eliminate) - - IPC semaphores (goal: d-bus? / eliminate?) - - second timers -* New features in util: - - scheduler - - service and program boot-strap code - - bandwidth and time APIs - - buffered IO API - - HKDF implementation (crypto) - - load calculation API - - bandwidth calculation API -* Major changes in util: - - more expressive server (replaces selector) - - DNS lookup replaced by async service