docs/busybox.net/programming.html

   1 <!--#include file="header.html" -->
   2
   3 <h2>Rob's notes on programming busybox.</h2>
   4
   5 <ul>
   6   <li><a href="#goals">What are the goals of busybox?</a></li>
   7   <li><a href="#design">What is the design of busybox?</a></li>
   8   <li><a href="#source">How is the source code organized?</a></li>
   9   <ul>
  10     <li><a href="#source_applets">The applet directories.</a></li>
  11     <li><a href="#source_libbb">The busybox shared library (libbb)</a></li>
  12   </ul>
  13   <li><a href="#adding">Adding an applet to busybox</a></li>
  14   <li><a href="#standards">What standards does busybox adhere to?</a></li>
  15   <li><a href="#tips">Tips and tricks.</a></li>
  16   <ul>
  17     <li><a href="#tips_encrypted_passwords">Encrypted Passwords</a></li>
  18     <li><a href="#tips_vfork">Fork and vfork</a></li>
  19     <li><a href="#tips_short_read">Short reads and writes</a></li>
  20     <li><a href="#tips_memory">Memory used by relocatable code, PIC, and static linking.</a></li>
  21   </ul>
  22   <li><a href="#who">Who are the BusyBox developers?</a></li>
  23 </ul>
  24
  25 <h2><b><a name="goals">What are the goals of busybox?</a></b></h2>
  26
  27 <p>Busybox aims to be the smallest and simplest correct implementation of the
  28 standard Linux command line tools.  First and foremost, this means the
  29 smallest executable size we can manage.  We also want to have the simplest
  30 and cleanest implementation we can manage, be <a href="#standards">standards
  31 compliant</a>, minimize run-time memory usage (heap and stack), run fast, and
  32 take over the world.</p>
  33
  34 <h2><b><a name="design">What is the design of busybox?</a></b></h2>
  35
  36 <p>Busybox is like a swiss army knife: one thing with many functions.
  37 The busybox executable can act like many different programs depending on
  38 the name used to invoke it.  Normal practice is to create a bunch of symlinks
  39 pointing to the busybox binary, each of which triggers a different busybox
  40 function.  (See <a href="FAQ.html#getting_started">getting started</a> in the
  41 FAQ for more information on usage, and <a href="BusyBox.html">the
  42 busybox documentation</a> for a list of symlink names and what they do.)
  43
  44 <p>The "one binary to rule them all" approach is primarily for size reasons: a
  45 single multi-purpose executable is smaller then many small files could be.
  46 This way busybox only has one set of ELF headers, it can easily share code
  47 between different apps even when statically linked, it has better packing
  48 efficiency by avoding gaps between files or compression dictionary resets,
  49 and so on.</p>
  50
  51 <p>Work is underway on new options such as "make standalone" to build separate
  52 binaries for each applet, and a "libbb.so" to make the busybox common code
  53 available as a shared library.  Neither is ready yet at the time of this
  54 writing.</p>
  55
  56 <a name="source"></a>
  57
  58 <h2><a name="source_applets"><b>The applet directories</b></a></h2>
  59
  60 <p>The directory "applets" contains the busybox startup code (applets.c and
  61 busybox.c), and several subdirectories containing the code for the individual
  62 applets.</p>
  63
  64 <p>Busybox execution starts with the main() function in applets/busybox.c,
  65 which sets the global variable bb_applet_name to argv[0] and calls
  66 run_applet_by_name() in applets/applets.c.  That uses the applets[] array
  67 (defined in include/busybox.h and filled out in include/applets.h) to
  68 transfer control to the appropriate APPLET_main() function (such as
  69 cat_main() or sed_main()).  The individual applet takes it from there.</p>
  70
  71 <p>This is why calling busybox under a different name triggers different
  72 functionality: main() looks up argv[0] in applets[] to get a function pointer
  73 to APPLET_main().</p>
  74
  75 <p>Busybox applets may also be invoked through the multiplexor applet
  76 "busybox" (see busybox_main() in applets/busybox.c), and through the
  77 standalone shell (grep for STANDALONE_SHELL in applets/shell/*.c).
  78 See <a href="FAQ.html#getting_started">getting started</a> in the
  79 FAQ for more information on these alternate usage mechanisms, which are
  80 just different ways to reach the relevant APPLET_main() function.</p>
  81
  82 <p>The applet subdirectories (archival, console-tools, coreutils,
  83 debianutils, e2fsprogs, editors, findutils, init, loginutils, miscutils,
  84 modutils, networking, procps, shell, sysklogd, and util-linux) correspond
  85 to the configuration sub-menus in menuconfig.  Each subdirectory contains the
  86 code to implement the applets in that sub-menu, as well as a Config.in
  87 file defining that configuration sub-menu (with dependencies and help text
  88 for each applet), and the makefile segment (Makefile.in) for that
  89 subdirectory.</p>
  90
  91 <p>The run-time --help is stored in usage_messages[], which is initialized at
  92 the start of applets/applets.c and gets its help text from usage.h.  During the
  93 build this help text is also used to generate the BusyBox documentation (in
  94 html, txt, and man page formats) in the docs directory.  See
  95 <a href="#adding">adding an applet to busybox</a> for more
  96 information.</p>
  97
  98 <h2><a name="source_libbb"><b>libbb</b></a></h2>
  99
 100 <p>Most non-setup code shared between busybox applets lives in the libbb
 101 directory.  It's a mess that evolved over the years without much auditing
 102 or cleanup.  For anybody looking for a great project to break into busybox
 103 development with, documenting libbb would be both incredibly useful and good
 104 experience.</p>
 105
 106 <p>Common themes in libbb include allocation functions that test
 107 for failure and abort the program with an error message so the caller doesn't
 108 have to test the return value (xmalloc(), xstrdup(), etc), wrapped versions
 109 of open(), close(), read(), and write() that test for their own failures
 110 and/or retry automatically, linked list management functions (llist.c),
 111 command line argument parsing (getopt_ulflags.c), and a whole lot more.</p>
 112
 113 <h2><a name="adding"><b>Adding an applet to busybox</b></a></h2>
 114
 115 <p>To add a new applet to busybox, first pick a name for the applet and
 116 a corresponding CONFIG_NAME.  Then do this:</p>
 117
 118 <ul>
 119 <li>Figure out where in the busybox source tree your applet best fits,
 120 and put your source code there.  Be sure to use APPLET_main() instead
 121 of main(), where APPLET is the name of your applet.</li>
 122
 123 <li>Add your applet to the relevant Config.in file (which file you add
 124 it to determines where it shows up in "make menuconfig").  This uses
 125 the same general format as the linux kernel's configuration system.</li>
 126
 127 <li>Add your applet to the relevant Makefile.in file (in the same
 128 directory as the Config.in you chose), using the existing entries as a
 129 template and the same CONFIG symbol as you used for Config.in.  (Don't
 130 forget "needlibm" or "needcrypt" if your applet needs libm or
 131 libcrypt.)</li>
 132
 133 <li>Add your applet to "include/applets.h", using one of the existing
 134 entries as a template.  (Note: this is in alphabetical order.  Applets
 135 are found via binary search, and if you add an applet out of order it
 136 won't work.)</li>
 137
 138 <li>Add your applet's runtime help text to "include/usage.h".  You need
 139 at least appname_trivial_usage (the minimal help text, always included
 140 in the busybox binary when this applet is enabled) and appname_full_usage
 141 (extra help text included in the busybox binary with
 142 CONFIG_FEATURE_VERBOSE_USAGE is enabled), or it won't compile.
 143 The other two help entry types (appname_example_usage and
 144 appname_notes_usage) are optional.  They don't take up space in the binary,
 145 but instead show up in the generated documentation (BusyBox.html,
 146 BusyBox.txt, and the man page BusyBox.1).</li>
 147
 148 <li>Run menuconfig, switch your applet on, compile, test, and fix the
 149 bugs.  Be sure to try both "allyesconfig" and "allnoconfig" (and
 150 "allbareconfig" if relevant).</li>
 151
 152 </ul>
 153
 154 <h2><a name="standards">What standards does busybox adhere to?</a></h2>
 155
 156 <p>The standard we're paying attention to is the "Shell and Utilities"
 157 portion of the <a href="http://www.opengroup.org/onlinepubs/009695399/">Open
 158 Group Base Standards</a> (also known as the Single Unix Specification version
 159 3 or SUSv3).  Note that paying attention isn't necessarily the same thing as
 160 following it.</p>
 161
 162 <p>SUSv3 doesn't even mention things like init, mount, tar, or losetup, nor
 163 commonly used options like echo's '-e' and '-n', or sed's '-i'.  Busybox is
 164 driven by what real users actually need, not the fact the standard believes
 165 we should implement ed or sccs.  For size reasons, we're unlikely to include
 166 much internationalization support beyond UTF-8, and on top of all that, our
 167 configuration menu lets developers chop out features to produce smaller but
 168 very non-standard utilities.</p>
 169
 170 <p>Also, Busybox is aimed primarily at Linux.  Unix standards are interesting
 171 because Linux tries to adhere to them, but portability to dozens of platforms
 172 is only interesting in terms of offering a restricted feature set that works
 173 everywhere, not growing dozens of platform-specific extensions.  Busybox
 174 should be portable to all hardware platforms Linux supports, and any other
 175 similar operating systems that are easy to do and won't require much
 176 maintenance.</p>
 177
 178 <p>In practice, standards compliance tends to be a clean-up step once an
 179 applet is otherwise finished.  When polishing and testing a busybox applet,
 180 we ensure we have at least the option of full standards compliance, or else
 181 document where we (intentionally) fall short.</p>
 182
 183 <h2><a name="tips" />Programming tips and tricks.</a></h2>
 184
 185 <p>Various things busybox uses that aren't particularly well documented
 186 elsewhere.</p>
 187
 188 <h2><a name="tips_encrypted_passwords">Encrypted Passwords</a></h2>
 189
 190 <p>Password fields in /etc/passwd and /etc/shadow are in a special format.
 191 If the first character isn't '$', then it's an old DES style password.  If
 192 the first character is '$' then the password is actually three fields
 193 separated by '$' characters:</p>
 194 <pre>
 195   <b>$type$salt$encrypted_password</b>
 196 </pre>
 197
 198 <p>The "type" indicates which encryption algorithm to use: 1 for MD5 and 2 for SHA1.</p>
 199
 200 <p>The "salt" is a bunch of ramdom characters (generally 8) the encryption
 201 algorithm uses to perturb the password in a known and reproducible way (such
 202 as by appending the random data to the unencrypted password, or combining
 203 them with exclusive or).  Salt is randomly generated when setting a password,
 204 and then the same salt value is re-used when checking the password.  (Salt is
 205 thus stored unencrypted.)</p>
 206
 207 <p>The advantage of using salt is that the same cleartext password encrypted
 208 with a different salt value produces a different encrypted value.
 209 If each encrypted password uses a different salt value, an attacker is forced
 210 to do the cryptographic math all over again for each password they want to
 211 check.  Without salt, they could simply produce a big dictionary of commonly
 212 used passwords ahead of time, and look up each password in a stolen password
 213 file to see if it's a known value.  (Even if there are billions of possible
 214 passwords in the dictionary, checking each one is just a binary search against
 215 a file only a few gigabytes long.)  With salt they can't even tell if two
 216 different users share the same password without guessing what that password
 217 is and decrypting it.  They also can't precompute the attack dictionary for
 218 a specific password until they know what the salt value is.</p>
 219
 220 <p>The third field is the encrypted password (plus the salt).  For md5 this
 221 is 22 bytes.</p>
 222
 223 <p>The busybox function to handle all this is pw_encrypt(clear, salt) in
 224 "libbb/pw_encrypt.c".  The first argument is the clear text password to be
 225 encrypted, and the second is a string in "$type$salt$password" format, from
 226 which the "type" and "salt" fields will be extracted to produce an encrypted
 227 value.  (Only the first two fields are needed, the third $ is equivalent to
 228 the end of the string.)  The return value is an encrypted password in
 229 /etc/passwd format, with all three $ separated fields.  It's stored in
 230 a static buffer, 128 bytes long.</p>
 231
 232 <p>So when checking an existing password, if pw_encrypt(text,
 233 old_encrypted_password) returns a string that compares identical to
 234 old_encrypted_password, you've got the right password.  When setting a new
 235 password, generate a random 8 character salt string, put it in the right
 236 format with sprintf(buffer, "$%c$%s", type, salt), and feed buffer as the
 237 second argument to pw_encrypt(text,buffer).</p>
 238
 239 <h2><a name="tips_vfork">Fork and vfork</a></h2>
 240
 241 <p>On systems that haven't got a Memory Management Unit, fork() is unreasonably
 242 expensive to implement (and sometimes even impossible), so a less capable
 243 function called vfork() is used instead.  (Using vfork() on a system with an
 244 MMU is like pounding a nail with a wrench.  Not the best tool for the job, but
 245 it works.)</p>
 246
 247 <p>Busybox hides the difference between fork() and vfork() in
 248 libbb/bb_fork_exec.c.  If you ever want to fork and exec, use bb_fork_exec()
 249 (which returns a pid and takes the same arguments as execve(), although in
 250 this case envp can be NULL) and don't worry about it.  This description is
 251 here in case you want to know why that does what it does.</p>
 252
 253 <p>Implementing fork() depends on having a Memory Management Unit.  With an
 254 MMU then you can simply set up a second set of page tables and share the
 255 physical memory via copy-on-write.  So a fork() followed quickly by exec()
 256 only copies a few pages of the parent's memory, just the ones it changes
 257 before freeing them.</p>
 258
 259 <p>With a very primitive MMU (using a base pointer plus length instead of page
 260 tables, which can provide virtual addresses and protect processes from each
 261 other, but no copy on write) you can still implement fork.  But it's
 262 unreasonably expensive, because you have to copy all the parent process'
 263 memory into the new process (which could easily be several megabytes per fork).
 264 And you have to do this even though that memory gets freed again as soon as the
 265 exec happens.  (This is not just slow and a waste of space but causes memory
 266 usage spikes that can easily cause the system to run out of memory.)</p>
 267
 268 <p>Without even a primitive MMU, you have no virtual addresses.  Every process
 269 can reach out and touch any other process' memory, because all pointers are to
 270 physical addresses with no protection.  Even if you copy a process' memory to
 271 new physical addresses, all of its pointers point to the old objects in the
 272 old process.  (Searching through the new copy's memory for pointers and
 273 redirect them to the new locations is not an easy problem.)</p>
 274
 275 <p>So with a primitive or missing MMU, fork() is just not a good idea.</p>
 276
 277 <p>In theory, vfork() is just a fork() that writeably shares the heap and stack
 278 rather than copying it (so what one process writes the other one sees).  In
 279 practice, vfork() has to suspend the parent process until the child does exec,
 280 at which point the parent wakes up and resumes by returning from the call to
 281 vfork().  All modern kernel/libc combinations implement vfork() to put the
 282 parent to sleep until the child does its exec.  There's just no other way to
 283 make it work: the parent has to know the child has done its exec() or exit()
 284 before it's safe to return from the function it's in, so it has to block
 285 until that happens.  In fact without suspending the parent there's no way to
 286 even store separate copies of the return value (the pid) from the vfork() call
 287 itself: both assignments write into the same memory location.</p>
 288
 289 <p>One way to understand (and in fact implement) vfork() is this: imagine
 290 the parent does a setjmp and then continues on (pretending to be the child)
 291 until the exec() comes around, then the _exec_ does the actual fork, and the
 292 parent does a longjmp back to the original vfork call and continues on from
 293 there.  (It thus becomes obvious why the child can't return, or modify
 294 local variables it doesn't want the parent to see changed when it resumes.)
 295
 296 <p>Note a common mistake: the need for vfork doesn't mean you can't have two
 297 processes running at the same time.  It means you can't have two processes
 298 sharing the same memory without stomping all over each other.  As soon as
 299 the child calls exec(), the parent resumes.</p>
 300
 301 <p>If the child's attempt to call exec() fails, the child should call _exit()
 302 rather than a normal exit().  This avoids any atexit() code that might confuse
 303 the parent.  (The parent should never call _exit(), only a vforked child that
 304 failed to exec.)</p>
 305
 306 <p>(Now in theory, a nommu system could just copy the _stack_ when it forks
 307 (which presumably is much shorter than the heap), and leave the heap shared.
 308 Even with no MMU at all
 309 In practice, you've just wound up in a multi-threaded situation and you can't
 310 do a malloc() or free() on your heap without freeing the other process' memory
 311 (and if you don't have the proper locking for being threaded, corrupting the
 312 heap if both of you try to do it at the same time and wind up stomping on
 313 each other while traversing the free memory lists).  The thing about vfork is
 314 that it's a big red flag warning "there be dragons here" rather than
 315 something subtle and thus even more dangerous.)</p>
 316
 317 <h2><a name="tips_sort_read">Short reads and writes</a></h2>
 318
 319 <p>Busybox has special functions, bb_full_read() and bb_full_write(), to
 320 check that all the data we asked for got read or written.  Is this a real
 321 world consideration?  Try the following:</p>
 322
 323 <pre>while true; do echo hello; sleep 1; done | tee out.txt</pre>
 324
 325 <p>If tee is implemented with bb_full_read(), tee doesn't display output
 326 in real time but blocks until its entire input buffer (generally a couple
 327 kilobytes) is read, then displays it all at once.  In that case, we _want_
 328 the short read, for user interface reasons.  (Note that read() should never
 329 return 0 unless it has hit the end of input, and an attempt to write 0
 330 bytes should be ignored by the OS.)</p>
 331
 332 <p>As for short writes, play around with two processes piping data to each
 333 other on the command line (cat bigfile | gzip &gt; out.gz) and suspend and
 334 resume a few times (ctrl-z to suspend, "fg" to resume).  The writer can
 335 experience short writes, which are especially dangerous because if you don't
 336 notice them you'll discard data.  They can also happen when a system is under
 337 load and a fast process is piping to a slower one.  (Such as an xterm waiting
 338 on x11 when the scheduler decides X is being a CPU hog with all that
 339 text console scrolling...)</p>
 340
 341 <p>So will data always be read from the far end of a pipe at the
 342 same chunk sizes it was written in?  Nope.  Don't rely on that.  For one
 343 counterexample, see <a href="http://www.faqs.org/rfcs/rfc896.html">rfc 896
 344 for Nagle's algorithm</a>, which waits a fraction of a second or so before
 345 sending out small amounts of data through a TCP/IP connection in case more
 346 data comes in that can be merged into the same packet.  (In case you were
 347 wondering why action games that use TCP/IP set TCP_NODELAY to lower the latency
 348 on their their sockets, now you know.)</p>
 349
 350 <h2><a name="tips_memory">Memory used by relocatable code, PIC, and static linking.</a></h2>
 351
 352 <p>The downside of standard dynamic linking is that it results in self-modifying
 353 code.  Although each executable's pages are mmaped() into a process' address
 354 space from the executable file and are thus naturally shared between processes
 355 out of the page cache, the library loader (ld-linux.so.2 or ld-uClibc.so.0)
 356 writes to these pages to supply addresses for relocatable symbols.  This
 357 dirties the pages, triggering copy-on-write allocation of new memory for each
 358 processes' dirtied pages.</p>
 359
 360 <p>One solution to this is Position Independent Code (PIC), a way of linking
 361 a file so all the relocations are grouped together.  This dirties fewer
 362 pages (often just a single page) for each process' relocations.  The down
 363 side is this results in larger executables, which take up more space on disk
 364 (and a correspondingly larger space in memory).  But when many copies of the
 365 same program are running, PIC dynamic linking trades a larger disk footprint
 366 for a smaller memory footprint, by sharing more pages.</p>
 367
 368 <p>A third solution is static linking.  A statically linked program has no
 369 relocations, and thus the entire executable is shared between all running
 370 instances.  This tends to have a significantly larger disk footprint, but
 371 on a system with only one or two executables, shared libraries aren't much
 372 of a win anyway.</p>
 373
 374 <p>You can tell the glibc linker to display debugging information about its
 375 relocations with the environment variable "LD_DEBUG".  Try
 376 "LD_DEBUG=help /bin/true" for a list of commands.  Learning to interpret
 377 "LD_DEBUG=statistics cat /proc/self/statm" could be interesting.</p>
 378
 379 <p>For more on this topic, here's Rich Felker:</p>
 380 <blockquote>
 381 <p>Dynamic linking (without fixed load addresses) fundamentally requires
 382 at least one dirty page per dso that uses symbols. Making calls (but
 383 never taking the address explicitly) to functions within the same dso
 384 does not require a dirty page by itself, but will with ELF unless you
 385 use -Bsymbolic or hidden symbols when linking.</p>
 386
 387 <p>ELF uses significant additional stack space for the kernel to pass all
 388 the ELF data structures to the newly created process image. These are
 389 located above the argument list and environment. This normally adds 1
 390 dirty page to the process size.</p>
 391
 392 <p>The ELF dynamic linker has its own data segment, adding one or more
 393 dirty pages. I believe it also performs relocations on itself.</p>
 394
 395 <p>The ELF dynamic linker makes significant dynamic allocations to manage
 396 the global symbol table and the loaded dso's. This data is never
 397 freed. It will be needed again if libdl is used, so unconditionally
 398 freeing it is not possible, but normal programs do not use libdl. Of
 399 course with glibc all programs use libdl (due to nsswitch) so the
 400 issue was never addressed.</p>
 401
 402 <p>ELF also has the issue that segments are not page-aligned on disk.
 403 This saves up to 4k on disk, but at the expense of using an additional
 404 dirty page in most cases, due to a large portion of the first data
 405 page being filled with a duplicate copy of the last text page.</p>
 406
 407 <p>The above is just a partial list of the tiny memory penalties of ELF
 408 dynamic linking, which eventually add up to quite a bit. The smallest
 409 I've been able to get a process down to is 8 dirty pages, and the
 410 above factors seem to mostly account for it (but some were difficult
 411 to measure).</p>
 412 </blockquote>
 413
 414 <h2><a name="who">Who are the BusyBox developers?</a></h2>
 415
 416 <p>The following login accounts currently exist on busybox.net.  (I.E. these
 417 people can commit <a href="http://busybox.net/downloads/patches">patches</a>
 418 into subversion for the BusyBox, uClibc, and buildroot projects.)</p>
 419
 420 <pre>
 421 aldot     :Bernhard Fischer
 422 andersen  :Erik Andersen      <- uClibc and BuildRoot maintainer.
 423 bug1      :Glenn McGrath
 424 davidm    :David McCullough
 425 gkajmowi  :Garrett Kajmowicz  <- uClibc++ maintainer
 426 jbglaw    :Jan-Benedict Glaw
 427 jocke     :Joakim Tjernlund
 428 landley   :Rob Landley        <- BusyBox maintainer
 429 lethal    :Paul Mundt
 430 mjn3      :Manuel Novoa III
 431 osuadmin  :osuadmin
 432 pgf       :Paul Fox
 433 pkj       :Peter Kjellerstedt
 434 prpplague :David Anders
 435 psm       :Peter S. Mazinger
 436 russ      :Russ Dill
 437 sandman   :Robert Griebl
 438 sjhill    :Steven J. Hill
 439 solar     :Ned Ludd
 440 timr      :Tim Riker
 441 tobiasa   :Tobias Anderberg
 442 vapier    :Mike Frysinger
 443 </pre>
 444
 445 <p>The following accounts used to exist on busybox.net, but don't anymore so
 446 I can't ask /etc/passwd for their names.  (If anybody would like to make
 447 a stab at it...)</p>
 448
 449 <pre>
 450 aaronl
 451 beppu
 452 dwhedon
 453 erik    : Also Erik Andersen?
 454 gfeldman
 455 jimg
 456 kraai
 457 markw
 458 miles
 459 proski
 460 rjune
 461 tausq
 462 vodz      :Vladimir N. Oleynik
 463 </pre>
 464
 465
 466 <br>
 467 <br>
 468 <br>
 469
 470 <!--#include file="footer.html" -->