add unicode.txt

author Denys Vlasenko <vda.linux@googlemail.com>

Mon, 1 Feb 2010 14:58:08 +0000 (15:58 +0100)

committer Denys Vlasenko <vda.linux@googlemail.com>

Mon, 1 Feb 2010 14:58:08 +0000 (15:58 +0100)
author Denys Vlasenko <vda.linux@googlemail.com>
Mon, 1 Feb 2010 14:58:08 +0000 (15:58 +0100)
committer Denys Vlasenko <vda.linux@googlemail.com>
Mon, 1 Feb 2010 14:58:08 +0000 (15:58 +0100)
diff --git a/docs/unicode.txt b/docs/unicode.txt

new file mode 100644 (file)

index 0000000..019d12f
--- /dev/null
+++ b/docs/unicode.txt
@@ -0,0 +1,56 @@
+       Unicode support in busybox
+
+There are several scenarios where we need to handle unicode
+correctly.
+
+       Shell input
+
+We want to correctly handle input of unicode characters.
+There are several problems with it. Just handling input
+as sequence of bytes would break any editing. This was fixed
+and now lineedit operates on the array of wchar_t's.
+But we also need to handle the following problematic moments:
+
+* It is unreasonable to expect that output device supports
+  _any_ unicode chars. Perhaps we need to avoid printing
+  those chars which are not supported by output device.
+  Examples: chars which are not present in the font,
+  chars which are not assigned in unicode,
+  combining chars (especially trying to combine bad pairs:
+  a_chinese_symbol + "combining grave accent" = ??!)
+
+* We need to account for the fact that unicode chars have
+  different widths: 0 for combining chars, 1 for usual,
+  2 for ideograms (are there 3+ wide chars?).
+
+* Bidirectional handling. If user wants to echo a phrase
+  in Hebrew, he types: echo "srettel werbeH"
+
+       Editors
+
+This case is a bit similar to "shell input", but unlike shell,
+editors may encounder many more unexpected unicode sequences
+(try to load a random binry file...), and they need to preserve
+them, unlike shell which can afford to drop bogus input.
+
+
+       more, less
+
+.
+
+       ls (multi-column display)
+
+.
+
+       top, ps
+
+.
+
+       Filename display (in error messages and elsewhere)
+
+.
+
+
+
+TODO: write an email to Asmus Freytag (asmus@unicode.org),
+author of http://unicode.org/reports/tr11/
author	Denys Vlasenko <vda.linux@googlemail.com>
	Mon, 1 Feb 2010 14:58:08 +0000 (15:58 +0100)
committer	Denys Vlasenko <vda.linux@googlemail.com>
	Mon, 1 Feb 2010 14:58:08 +0000 (15:58 +0100)