From b1b3cee831bc8dfcf439ad69f4694d0a8ca3f7e9 Mon Sep 17 00:00:00 2001
From: Rob Landley <rob@landley.net>
Date: Sun, 29 Jan 2006 06:29:01 +0000
Subject: [PATCH] Add explanations of encrypted passwords, and fork vs vfork.

---
 docs/busybox.net/programming.html | 115 ++++++++++++++++++++++++++++++
 1 file changed, 115 insertions(+)
diff --git a/docs/busybox.net/programming.html b/docs/busybox.net/programming.html
index e44f291b3..f77f3c3a6 100644
--- a/docs/busybox.net/programming.html
+++ b/docs/busybox.net/programming.html
@@ -12,6 +12,11 @@
   </ul>
   <li><a href="#adding">Adding an applet to busybox</a></li>
   <li><a href="#standards">What standards does busybox adhere to?</a></li>
+  <li><a href="#tips">Tips and tricks.</a></li>
+  <ul>
+    <li><a href="#tips_encrypted_passwords">Encrypted Passwords</a></li>
+    <li><a href="#tips_vfork">Fork and vfork</a></li>
+  </ul>
 </ul>
 
 <h2><b><a name="goals" />What are the goals of busybox?</b></h2>
@@ -172,6 +177,116 @@ applet is otherwise finished.  When polishing and testing a busybox applet,
 we ensure we have at least the option of full standards compliance, or else
 document where we (intentionally) fall short.</p>
 
+<h2><a name="tips" />Programming tips and tricks.</a></h2>
+
+<p>Various things busybox uses that aren't particularly well documented
+elsewhere.</p>
+
+<h2><a name="tips_encrypted_passwords">Encrypted Passwords</a></h2>
+
+<p>Password fields in /etc/passwd and /etc/shadow are in a special format.
+If the first character isn't '$', then it's an old DES style password.  If
+the first character is '$' then the password is actually three fields
+separated by '$' characters:</p>
+<pre>
+  <b>$type$salt$encrypted_password</b>
+</pre>
+
+<p>The "type" indicates which encryption algorithm to use: 1 for MD5 and 2 for SHA1.</p>
+
+<p>The "salt" is a bunch of ramdom characters (generally 8) the encryption
+algorithm uses to perturb the password in a known and reproducible way (such
+as by appending the random data to the unencrypted password, or combining
+them with exclusive or).  Salt is randomly generated when setting a password,
+and then the same salt value is re-used when checking the password.  (Salt is
+thus stored unencrypted.)</p>
+
+<p>The advantage of using salt is that the same cleartext password encrypted
+with a different salt value produces a different encrypted value.
+If each encrypted password uses a different salt value, an attacker is forced
+to do the cryptographic math all over again for each password they want to
+check.  Without salt, they could simply produce a big dictionary of commonly
+used passwords ahead of time, and look up each password in a stolen password
+file to see if it's a known value.  (Even if there are billions of possible
+passwords in the dictionary, checking each one is just a binary search against
+a file only a few gigabytes long.)  With salt they can't even tell if two
+different users share the same password without guessing what that password
+is and decrypting it.  They also can't precompute the attack dictionary for
+a specific password until they know what the salt value is.</p>
+
+<p>The third field is the encrypted password (plus the salt).  For md5 this
+is 22 bytes.</p>
+
+<p>The busybox function to handle all this is pw_encrypt(clear, salt) in
+"libbb/pw_encrypt.c".  The first argument is the clear text password to be
+encrypted, and the second is a string in "$type$salt$password" format, from
+which the "type" and "salt" fields will be extracted to produce an encrypted
+value.  (Only the first two fields are needed, the third $ is equivalent to
+the end of the string.)  The return value is an encrypted password in
+/etc/passwd format, with all three $ separated fields.  It's stored in
+a static buffer, 128 bytes long.</p>
+
+<p>So when checking an existing password, if pw_encrypt(text,
+old_encrypted_password) returns a string that compares identical to
+old_encrypted_password, you've got the right password.  When setting a new
+password, generate a random 8 character salt string, put it in the right
+format with sprintf(buffer, "$%c$%s", type, salt), and feed buffer as the
+second argument to pw_encrypt(text,buffer).</p>
+
+<h2><a name="tips_vfork">Fork and vfork</a></h2>
+
+<p>On systems that haven't got a Memory Management Unit, fork() is unreasonably
+expensive to implement, so a less capable function called vfork() is used
+instead.</p>
+
+<p>The reason vfork() exists is that if you haven't got an MMU then you can't
+simply set up a second set of page tables and share the physical memory via
+copy-on-write, which is what fork() normally does.  This means that actually
+forking has to copy all the parent's memory (which could easily be tens of
+megabytes).  And you have to do this even though that memory gets freed again
+as soon as the exec happens, so it's probably all a big waste of time.</p>
+
+<p>This is not only slow and a waste of space, it also causes totally
+unnecessary memory usage spikes based on how big the _parent_ process is (not
+the child), and these spikes are quite likely to trigger an out of memory
+condition on small systems (which is where nommu is common anyway).  So
+although you _can_ emulate a real fork on a nommu system, you really don't
+want to.</p>
+
+<p>In theory, vfork() is just a fork() that writeably shares the heap and stack
+rather than copying it (so what one process writes the other one sees).  In
+practice, vfork() has to suspend the parent process until the child does exec,
+at which point the parent wakes up and resumes by returning from the call to
+vfork().  All modern kernel/libc combinations implement vfork() to put the
+parent to sleep until the child does its exec.  There's just no other way to
+make it work: they're sharing the same stack, so if either one returns from its
+function it stomps on the callstack so that when the other process returns,
+hilarity ensues.  In fact without suspending the parent there's no way to even
+store separate copies of the return value (the pid) from the vfork() call
+itself: both assignments write into the same memory location.</p>
+
+<p>One way to understand (and in fact implement) vfork() is this: imagine
+the parent does a setjmp and then continues on (pretending to be the child)
+until the exec() comes around, then the _exec_ does the actual fork, and the
+parent does a longjmp back to the original vfork call and continues on from
+there.  (It thus becomes obvious why the child can't return, or modify
+local variables it doesn't want the parent to see changed when it resumes.)
+
+<p>Note a common mistake: the need for vfork doesn't mean you can't have two
+processes running at the same time.  It means you can't have two processes
+sharing the same memory without stomping all over each other.  As soon as
+the child calls exec(), the parent resumes.</p>
+
+<p>(Now in theory, a nommu system could just copy the _stack_ when it forks
+(which presumably is much shorter than the heap), and leave the heap shared.
+In practice, you've just wound up in a multi-threaded situation and you can't
+do a malloc() or free() on your heap without freeing the other process's memory
+(and if you don't have the proper locking for being threaded, corrupting the
+heap if both of you try to do it at the same time and wind up stomping on
+each other while traversing the free memory lists).  The thing about vfork is
+that it's a big red flag warning "there be dragons here" rather than
+something subtle and thus even more dangerous.)</p>
+
 <br>
 <br>
 <br>
-- 
2.25.1