MH 6.8 and above have support for "international" characters -- that is, non-English characters. This is distinct from the MIME support. Support is enabled by [LOCALE] configuration option (see the Section The -help Switches).
For C programmers, here's a typical change that [LOCALE] makes. This is from the file sbr/gans.c:
#ifdef LOCALE i = (isalpha(i) && isupper(i)) ? tolower(i) : i; #else if (i >= 'A' && i <= 'Z') i += 'a' - 'A'; #endif
Once you get your POSIX-compliant system set up correctly, MH and programs it calls will behave more naturally. For example, when you use the vi editor command for "next word," the cursor won't stop in the middle of a word at a non-ASCII character. Programs like grep(1) and sort(1) should understand how to handle the characters in your language.
As with all POSIX internationalization, though, the character support in MH is system-dependent. Don't expect everything to work perfectly. And the setup varies from system to system; check your documentation or ask a local expert. Various manual pages to try on HP-UX are: environ(5), setlocale(3), and hpnls(5). Read locale(5) and setlocale(3) on SunOS.
For the most complete setup, you should know about the LANG environment variable. The full syntax for the value of LANG is:
The brackets  mark optional parts; don't include the brackets when you set the variable. An example setting of LANG is:
french_canadian.iso88591@nofoldLanguage is the only parameter that is (almost) consistent across platforms; it is used to find the databases for all the locale categories. HP-UX uses the full syntax (the "modifier" might even be an HP-UX addition). SunOS uses the language as others use the codeset. SCO always uses the territory as well.
The locale categories are set by the environment variables LC_COLLATE (string collationi and sorting), LC_CTYPE (character classification and conversion, such as "is this character `printable'?"), LC_MONETARY (monetary formatting), LC_NUMERIC (for input and output of numbers), LC_TIME (time conversion), and LC_MESSAGES (messages to the user -- this isn't on all platforms).
For much of your email-related work, you may choose to set only LC_CTYPE. This won't change the way most tools behave in ways other than handling characters. Another advantage of not setting all categories is that incomplete implementations won't give warning messages when they don't support a particular setting.
If you're trying to choose a good value for those environment variables and no one else in your organization has already found good settings, look in the databases. The databases are usually located under /usr/lib/locale, /usr/share/lib/locale or (in the case of SunOS) /etc/locale.
As an example, the Table below has the settings that Kimmo Suominen (from Finland, working in New York) uses on different platforms:
Table: Sample LANG settings
Platform Setting Comment SVR4 finnish -- HP-UX american.iso88591 okay, finnish.iso88591 SunOS iso_8859_1 yes, note the underscores SCO english_us.88591 has the territory
This file is from the third edition of the book MH & xmh: Email for Users & Programmers, ISBN 1-56592-093-7, by Jerry Peek. It is freely available; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation. For more information, see COPYING.
Copyright © 1991, 1992, 1995 O'Reilly Media, Inc.
Copyright © 1996, 1997, 1999, 2000, 2002, 2004 Jerry Peek
Last modified: 2006-05-31 15:13:43 -0700