1 <!--#include file="header.html" -->
3 <h2>Rob's notes on programming busybox.</h2>
6 <li><a href="#goals">What are the goals of busybox?</a></li>
7 <li><a href="#design">What is the design of busybox?</a></li>
8 <li><a href="#source">How is the source code organized?</a></li>
10 <li><a href="#source_applets">The applet directories.</a></li>
11 <li><a href="#source_libbb">The busybox shared library (libbb)</a></li>
13 <li><a href="#adding">Adding an applet to busybox</a></li>
14 <li><a href="#standards">What standards does busybox adhere to?</a></li>
15 <li><a href="#tips">Tips and tricks.</a></li>
17 <li><a href="#tips_encrypted_passwords">Encrypted Passwords</a></li>
18 <li><a href="#tips_vfork">Fork and vfork</a></li>
19 <li><a href="#tips_short_read">Short reads and writes</a></li>
21 <li><a href="#who">Who are the BusyBox developers?</a></li>
24 <h2><b><a name="goals" />What are the goals of busybox?</b></h2>
26 <p>Busybox aims to be the smallest and simplest correct implementation of the
27 standard Linux command line tools. First and foremost, this means the
28 smallest executable size we can manage. We also want to have the simplest
29 and cleanest implementation we can manage, be <a href="#standards">standards
30 compliant</a>, minimize run-time memory usage (heap and stack), run fast, and
31 take over the world.</p>
33 <h2><b><a name="design" />What is the design of busybox?</b></h2>
35 <p>Busybox is like a swiss army knife: one thing with many functions.
36 The busybox executable can act like many different programs depending on
37 the name used to invoke it. Normal practice is to create a bunch of symlinks
38 pointing to the busybox binary, each of which triggers a different busybox
39 function. (See <a href="FAQ.html#getting_started">getting started</a> in the
40 FAQ for more information on usage, and <a href="BusyBox.html">the
41 busybox documentation</a> for a list of symlink names and what they do.)
43 <p>The "one binary to rule them all" approach is primarily for size reasons: a
44 single multi-purpose executable is smaller then many small files could be.
45 This way busybox only has one set of ELF headers, it can easily share code
46 between different apps even when statically linked, it has better packing
47 efficiency by avoding gaps between files or compression dictionary resets,
50 <p>Work is underway on new options such as "make standalone" to build separate
51 binaries for each applet, and a "libbb.so" to make the busybox common code
52 available as a shared library. Neither is ready yet at the time of this
57 <h2><a name="source_applets" /><b>The applet directories</b></h2>
59 <p>The directory "applets" contains the busybox startup code (applets.c and
60 busybox.c), and several subdirectories containing the code for the individual
63 <p>Busybox execution starts with the main() function in applets/busybox.c,
64 which sets the global variable bb_applet_name to argv[0] and calls
65 run_applet_by_name() in applets/applets.c. That uses the applets[] array
66 (defined in include/busybox.h and filled out in include/applets.h) to
67 transfer control to the appropriate APPLET_main() function (such as
68 cat_main() or sed_main()). The individual applet takes it from there.</p>
70 <p>This is why calling busybox under a different name triggers different
71 functionality: main() looks up argv[0] in applets[] to get a function pointer
74 <p>Busybox applets may also be invoked through the multiplexor applet
75 "busybox" (see busybox_main() in applets/busybox.c), and through the
76 standalone shell (grep for STANDALONE_SHELL in applets/shell/*.c).
77 See <a href="FAQ.html#getting_started">getting started</a> in the
78 FAQ for more information on these alternate usage mechanisms, which are
79 just different ways to reach the relevant APPLET_main() function.</p>
81 <p>The applet subdirectories (archival, console-tools, coreutils,
82 debianutils, e2fsprogs, editors, findutils, init, loginutils, miscutils,
83 modutils, networking, procps, shell, sysklogd, and util-linux) correspond
84 to the configuration sub-menus in menuconfig. Each subdirectory contains the
85 code to implement the applets in that sub-menu, as well as a Config.in
86 file defining that configuration sub-menu (with dependencies and help text
87 for each applet), and the makefile segment (Makefile.in) for that
90 <p>The run-time --help is stored in usage_messages[], which is initialized at
91 the start of applets/applets.c and gets its help text from usage.h. During the
92 build this help text is also used to generate the BusyBox documentation (in
93 html, txt, and man page formats) in the docs directory. See
94 <a href="#adding">adding an applet to busybox</a> for more
97 <h2><a name="source_libbb" /><b>libbb</b></h2>
99 <p>Most non-setup code shared between busybox applets lives in the libbb
100 directory. It's a mess that evolved over the years without much auditing
101 or cleanup. For anybody looking for a great project to break into busybox
102 development with, documenting libbb would be both incredibly useful and good
105 <p>Common themes in libbb include allocation functions that test
106 for failure and abort the program with an error message so the caller doesn't
107 have to test the return value (xmalloc(), xstrdup(), etc), wrapped versions
108 of open(), close(), read(), and write() that test for their own failures
109 and/or retry automatically, linked list management functions (llist.c),
110 command line argument parsing (getopt_ulflags.c), and a whole lot more.</p>
112 <h2><a name="adding" /><b>Adding an applet to busybox</b></h2>
114 <p>To add a new applet to busybox, first pick a name for the applet and
115 a corresponding CONFIG_NAME. Then do this:</p>
118 <li>Figure out where in the busybox source tree your applet best fits,
119 and put your source code there. Be sure to use APPLET_main() instead
120 of main(), where APPLET is the name of your applet.</li>
122 <li>Add your applet to the relevant Config.in file (which file you add
123 it to determines where it shows up in "make menuconfig"). This uses
124 the same general format as the linux kernel's configuration system.</li>
126 <li>Add your applet to the relevant Makefile.in file (in the same
127 directory as the Config.in you chose), using the existing entries as a
128 template and the same CONFIG symbol as you used for Config.in. (Don't
129 forget "needlibm" or "needcrypt" if your applet needs libm or
132 <li>Add your applet to "include/applets.h", using one of the existing
133 entries as a template. (Note: this is in alphabetical order. Applets
134 are found via binary search, and if you add an applet out of order it
137 <li>Add your applet's runtime help text to "include/usage.h". You need
138 at least appname_trivial_usage (the minimal help text, always included
139 in the busybox binary when this applet is enabled) and appname_full_usage
140 (extra help text included in the busybox binary with
141 CONFIG_FEATURE_VERBOSE_USAGE is enabled), or it won't compile.
142 The other two help entry types (appname_example_usage and
143 appname_notes_usage) are optional. They don't take up space in the binary,
144 but instead show up in the generated documentation (BusyBox.html,
145 BusyBox.txt, and the man page BusyBox.1).</li>
147 <li>Run menuconfig, switch your applet on, compile, test, and fix the
148 bugs. Be sure to try both "allyesconfig" and "allnoconfig" (and
149 "allbareconfig" if relevant).</li>
153 <h2><a name="standards" />What standards does busybox adhere to?</a></h2>
155 <p>The standard we're paying attention to is the "Shell and Utilities"
156 portion of the <a href=http://www.opengroup.org/onlinepubs/009695399/>Open
157 Group Base Standards</a> (also known as the Single Unix Specification version
158 3 or SUSv3). Note that paying attention isn't necessarily the same thing as
161 <p>SUSv3 doesn't even mention things like init, mount, tar, or losetup, nor
162 commonly used options like echo's '-e' and '-n', or sed's '-i'. Busybox is
163 driven by what real users actually need, not the fact the standard believes
164 we should implement ed or sccs. For size reasons, we're unlikely to include
165 much internationalization support beyond UTF-8, and on top of all that, our
166 configuration menu lets developers chop out features to produce smaller but
167 very non-standard utilities.</p>
169 <p>Also, Busybox is aimed primarily at Linux. Unix standards are interesting
170 because Linux tries to adhere to them, but portability to dozens of platforms
171 is only interesting in terms of offering a restricted feature set that works
172 everywhere, not growing dozens of platform-specific extensions. Busybox
173 should be portable to all hardware platforms Linux supports, and any other
174 similar operating systems that are easy to do and won't require much
177 <p>In practice, standards compliance tends to be a clean-up step once an
178 applet is otherwise finished. When polishing and testing a busybox applet,
179 we ensure we have at least the option of full standards compliance, or else
180 document where we (intentionally) fall short.</p>
182 <h2><a name="tips" />Programming tips and tricks.</a></h2>
184 <p>Various things busybox uses that aren't particularly well documented
187 <h2><a name="tips_encrypted_passwords">Encrypted Passwords</a></h2>
189 <p>Password fields in /etc/passwd and /etc/shadow are in a special format.
190 If the first character isn't '$', then it's an old DES style password. If
191 the first character is '$' then the password is actually three fields
192 separated by '$' characters:</p>
194 <b>$type$salt$encrypted_password</b>
197 <p>The "type" indicates which encryption algorithm to use: 1 for MD5 and 2 for SHA1.</p>
199 <p>The "salt" is a bunch of ramdom characters (generally 8) the encryption
200 algorithm uses to perturb the password in a known and reproducible way (such
201 as by appending the random data to the unencrypted password, or combining
202 them with exclusive or). Salt is randomly generated when setting a password,
203 and then the same salt value is re-used when checking the password. (Salt is
204 thus stored unencrypted.)</p>
206 <p>The advantage of using salt is that the same cleartext password encrypted
207 with a different salt value produces a different encrypted value.
208 If each encrypted password uses a different salt value, an attacker is forced
209 to do the cryptographic math all over again for each password they want to
210 check. Without salt, they could simply produce a big dictionary of commonly
211 used passwords ahead of time, and look up each password in a stolen password
212 file to see if it's a known value. (Even if there are billions of possible
213 passwords in the dictionary, checking each one is just a binary search against
214 a file only a few gigabytes long.) With salt they can't even tell if two
215 different users share the same password without guessing what that password
216 is and decrypting it. They also can't precompute the attack dictionary for
217 a specific password until they know what the salt value is.</p>
219 <p>The third field is the encrypted password (plus the salt). For md5 this
222 <p>The busybox function to handle all this is pw_encrypt(clear, salt) in
223 "libbb/pw_encrypt.c". The first argument is the clear text password to be
224 encrypted, and the second is a string in "$type$salt$password" format, from
225 which the "type" and "salt" fields will be extracted to produce an encrypted
226 value. (Only the first two fields are needed, the third $ is equivalent to
227 the end of the string.) The return value is an encrypted password in
228 /etc/passwd format, with all three $ separated fields. It's stored in
229 a static buffer, 128 bytes long.</p>
231 <p>So when checking an existing password, if pw_encrypt(text,
232 old_encrypted_password) returns a string that compares identical to
233 old_encrypted_password, you've got the right password. When setting a new
234 password, generate a random 8 character salt string, put it in the right
235 format with sprintf(buffer, "$%c$%s", type, salt), and feed buffer as the
236 second argument to pw_encrypt(text,buffer).</p>
238 <h2><a name="tips_vfork">Fork and vfork</a></h2>
240 <p>Busybox hides the difference between fork() and vfork() in
241 libbb/bb_fork_exec.c. If you ever want to fork and exec, use bb_fork_exec()
242 (which returns a pid and takes the same arguments as execve(), although in
243 this case envp can be NULL) and don't worry about it. This description is
244 here in case you want to know why that does what it does.</p>
246 <p>On systems that haven't got a Memory Management Unit, fork() is unreasonably
247 expensive to implement, so a less capable function called vfork() is used
250 <p>The reason vfork() exists is that if you haven't got an MMU then you can't
251 simply set up a second set of page tables and share the physical memory via
252 copy-on-write, which is what fork() normally does. This means that actually
253 forking has to copy all the parent's memory (which could easily be tens of
254 megabytes). And you have to do this even though that memory gets freed again
255 as soon as the exec happens, so it's probably all a big waste of time.</p>
257 <p>This is not only slow and a waste of space, it also causes totally
258 unnecessary memory usage spikes based on how big the _parent_ process is (not
259 the child), and these spikes are quite likely to trigger an out of memory
260 condition on small systems (which is where nommu is common anyway). So
261 although you _can_ emulate a real fork on a nommu system, you really don't
264 <p>In theory, vfork() is just a fork() that writeably shares the heap and stack
265 rather than copying it (so what one process writes the other one sees). In
266 practice, vfork() has to suspend the parent process until the child does exec,
267 at which point the parent wakes up and resumes by returning from the call to
268 vfork(). All modern kernel/libc combinations implement vfork() to put the
269 parent to sleep until the child does its exec. There's just no other way to
270 make it work: they're sharing the same stack, so if either one returns from its
271 function it stomps on the callstack so that when the other process returns,
272 hilarity ensues. In fact without suspending the parent there's no way to even
273 store separate copies of the return value (the pid) from the vfork() call
274 itself: both assignments write into the same memory location.</p>
276 <p>One way to understand (and in fact implement) vfork() is this: imagine
277 the parent does a setjmp and then continues on (pretending to be the child)
278 until the exec() comes around, then the _exec_ does the actual fork, and the
279 parent does a longjmp back to the original vfork call and continues on from
280 there. (It thus becomes obvious why the child can't return, or modify
281 local variables it doesn't want the parent to see changed when it resumes.)
283 <p>Note a common mistake: the need for vfork doesn't mean you can't have two
284 processes running at the same time. It means you can't have two processes
285 sharing the same memory without stomping all over each other. As soon as
286 the child calls exec(), the parent resumes.</p>
288 <p>If the child's attempt to call exec() fails, the child should call _exit()
289 rather than a normal exit(). This avoids any atexit() code that might confuse
290 the parent. (The parent should never call _exit(), only a vforked child that
293 <p>(Now in theory, a nommu system could just copy the _stack_ when it forks
294 (which presumably is much shorter than the heap), and leave the heap shared.
295 In practice, you've just wound up in a multi-threaded situation and you can't
296 do a malloc() or free() on your heap without freeing the other process's memory
297 (and if you don't have the proper locking for being threaded, corrupting the
298 heap if both of you try to do it at the same time and wind up stomping on
299 each other while traversing the free memory lists). The thing about vfork is
300 that it's a big red flag warning "there be dragons here" rather than
301 something subtle and thus even more dangerous.)</p>
303 <h2><a name="tips_sort_read">Short reads and writes</a></h2>
305 <p>Busybox has special functions, bb_full_read() and bb_full_write(), to
306 check that all the data we asked for got read or written. Is this a real
307 world consideration? Try the following:</p>
309 <pre>while true; do echo hello; sleep 1; done | tee out.txt</pre>
311 <p>If tee is implemented with bb_full_read(), tee doesn't display output
312 in real time but blocks until its entire input buffer (generally a couple
313 kilobytes) is read, then displays it all at once. In that case, we _want_
314 the short read, for user interface reasons. (Note that read() should never
315 return 0 unless it has hit the end of input, and an attempt to write 0
316 bytes should be ignored by the OS.)</p>
318 <p>As for short writes, play around with two processes piping data to each
319 other on the command line (cat bigfile | gzip > out.gz) and suspend and
320 resume a few times (ctrl-z to suspend, "fg" to resume). The writer can
321 experience short writes, which are especially dangerous because if you don't
322 notice them you'll discard data. They can also happen when a system is under
323 load and a fast process is piping to a slower one. (Such as an xterm waiting
324 on x11 when the scheduler decides X is being a CPU hog with all that
325 text console scrolling...)</p>
327 <p>So will data always be read from the far end of a pipe at the
328 same chunk sizes it was written in? Nope. Don't rely on that. For one
329 counterexample, see <a href="http://www.faqs.org/rfcs/rfc896.html">rfc 896</p>
330 for Nagle's algorithm</a>, which waits a fraction of a second or so before
331 sending out small amounts of data through a TCP/IP connection in case more
332 data comes in that can be merged into the same packet. (In case you were
333 wondering why action games that use TCP/IP set TCP_NODELAY to lower the latency
334 on their their sockets, now you know.)</p>
336 <h2><a name="who">Who are the BusyBox developers?</a></h2>
338 <p>The following login accounts currently exist on busybox.net. (I.E. these
339 people can commit <a href="http://busybox.net/downloads/patches">patches</a>
340 into subversion.)</p>
345 andersen: Erik Andersen <- Previous maintainer (busy with uClibc and buildroot)
349 erik : Also Erik Andersen?
353 landley : Rob Landley <- Current maintainer
364 sjhill : Steven J. Hill
369 vodz : Vladimir Oleynik
377 <!--#include file="footer.html" -->