docs/unicode.txt

   1         Unicode support in busybox
   2
   3 There are several scenarios where we need to handle unicode
   4 correctly.
   5
   6         Shell input
   7
   8 We want to correctly handle input of unicode characters.
   9 There are several problems with it. Just handling input
  10 as sequence of bytes would break any editing. This was fixed
  11 and now lineedit operates on the array of wchar_t's.
  12 But we also need to handle the following problematic moments:
  13
  14 * It is unreasonable to expect that output device supports
  15   _any_ unicode chars. Perhaps we need to avoid printing
  16   those chars which are not supported by output device.
  17   Examples: chars which are not present in the font,
  18   chars which are not assigned in unicode,
  19   combining chars (especially trying to combine bad pairs:
  20   a_chinese_symbol + "combining grave accent" = ??!)
  21
  22 * We need to account for the fact that unicode chars have
  23   different widths: 0 for combining chars, 1 for usual,
  24   2 for ideograms (are there 3+ wide chars?).
  25
  26 * Bidirectional handling. If user wants to echo a phrase
  27   in Hebrew, he types: echo "srettel werbeH"
  28
  29         Editors
  30
  31 This case is a bit similar to "shell input", but unlike shell,
  32 editors may encounder many more unexpected unicode sequences
  33 (try to load a random binry file...), and they need to preserve
  34 them, unlike shell which can afford to drop bogus input.
  35
  36
  37         more, less
  38
  39 .
  40
  41         ls (multi-column display)
  42
  43 .
  44
  45         top, ps
  46
  47 .
  48
  49         Filename display (in error messages and elsewhere)
  50
  51 .
  52
  53
  54
  55 TODO: write an email to Asmus Freytag (asmus@unicode.org),
  56 author of http://unicode.org/reports/tr11/