TTY Environment Setup
Depending on the terminal and terminal emulator that you are using, you might need to push certain codeset-specific STREAMS modules onto your streams.
For more information on STREAMS modules and streams in general, see the STREAMS Programming Guide.
The following table shows STREAMS modules supported by the en_US.UTF-8 locale in the terminal environment.
Table 5-7 32-bit STREAMS Modules Supported by en_US.UTF-8
The following table lists the 64-bit STREAMS modules supported by en_US.UTF-8.
Table 5-8 64-bit STREAMS Modules Supported by en_US.UTF-8
64-bit STREAMS module | Description |
---|---|
/usr/kernel/strmod/sparcv9/u8lat1 | Code conversion STREAMS module between UTF-8 and ISO8859-1 (Western European) |
/usr/kernel/strmod/sparcv9/u8lat2 | Code conversion STREAMS module between UTF-8 and ISO8859-2 (Eastern European) |
/usr/kernel/strmod/sparcv9/u8koi8 | Code conversion STREAMS module between UTF-8 and KOI8-R (Cyrillic) |
Loading a STREAMS Module at Kernel
To load a STREAMS module at kernel, first become root.
To determine whether you are running a 64-bit Solaris or 32-bit Solaris system, use the isainfo(1) utility as follows:
system# isainfo -v 64-bit sparcv9 applications 32-bit sparc applications |
If the command returns this information, you are running the 64-bit Solaris system. If you are running the 32-bit Solaris system, the utility shows the following:
system# isainfo -v 32-bit sparc applications |
Use modinfo(1M) to be certain that your system has not already loaded the STREAMS module:
system# modinfo | grep modulename |
If the STREAMS module, such as u8lat1, is already installed, the output looks as follows:
system# modinfo | grep u8lat1 89 ff798000 4b13 18 1 u8lat1 (UTF-8 <--> ISO 8859-1 module) |
If the module is already installed, you do not need to load it. However, if the module has not yet been loaded, use modload(1M) as follows:
system# modload /usr/kernel/strmod/u8lat1 |
This command loads the 32-bit u8lat1 STREAMS module at the kernel so you can push it onto a stream. If you are running the 64-bit Solaris product, use modload(1M) as follows:
system# modload /usr/kernel/strmod/sparcv9/u8lat1 |
The STREAMS module is loaded at the kernel and you can now push it onto a stream.
To unload a module from the kernel, use modunload(1M), as shown below. In this example, the u8lat1 module is being unloaded.
system# modinfo | grep u8lat1 89 ff798000 4b13 18 1 u8lat1 (UTF-8 <--> ISO 8859-1 module) system# modunload -i 89 |
dtterm and Terminals Capable of Input and Output of UTF-8 Characters
Unlike in previous releases of the Solaris operating environment, the dtterm(1) Terminal and any other terminals that support input and output of the UTF-8 codeset do not need to have any additional STREAMS modules in their stream. ldterm(7M) module is now codeset independent and supports Unicode/UTF-8 as well.
To set up the proper terminal environment for the Unicode locales, use the stty(1) utility. To query the current settings use the -a option of the stty(1) utility, as shown below:
system% /bin/stty -a |
Note - Because /usr/ucb/stty is not internationalized, use /bin/stty instead.
Terminal Support for Latin-1, Latin-2, or KOI8-R
For terminals that support only Latin-1 (ISO8859-1), Latin-2 (ISO8859-2), or KOI8-R, you should have the following STREAMS configuration:
head <-> ttcompat <-> ldterm <-> u8lat1 <-> TTY |
This configuration is only for terminals that support Latin-1. For Latin-2 terminals, replace the STREAMS module u8lat1 with u8lat2. For KOI8-R terminals, replace the module with u8koi8.
Make sure you already have the STREAMS module loaded into the kernel.
To set up the STREAMS configuration shown above, use strchg(1M), as shown in the second command line of the example:
system% cat > tmp/mystreams ttcompat ldterm u8lat1 ptem ^D system% strchg -f /tmp/mystreams |
Be sure that you are either root or the owner of the device when you use strchg(1). To see the current configuration, use strconf(1) as follows:
system% strconf ttcompat ldterm u8lat1 ptem pts system% |
To reset the original configuration, set the STREAMS configuration as follows:
system% cat > /tmp/orgstreams ttcompat ldterm ptem ^D system% strchg -f /tmp/orgstreams |
Saving the Settings in ~/.cshrc
Assuming the necessary STREAMS modules are already loaded with the kernel, you can save the following lines in your .cshrc file (C shell example) for convenience:
setenv LANG en_US.UTF-8 if ($?USER != 0 && $?prompt != 0) then cat >! /tmp/mystreams$$ << _EOF ttcompat ldtterm u8lat1 ptem _EOF /bin/strchg -f /tmp/mystreams$$ /bin/rm -f /tmp/mystreams$$ /bin/stty cs8 -istrip defeucw endif |
With these lines in your.cshrc file, you do not have to type all of the commands each time you use the STREAMS module. Note that the second _EOF should start from the first column of the file.
Code Conversions
Unicode locale support adds various code conversions among major codesets of many countries through iconv(1), iconv(3C), and sdtconvtool(1).
In the Solaris 9 environment, the utility geniconvtbl enables user-defined code conversions. The user-defined code conversions created with the geniconvtbl utility can be used with both iconv(1) and iconv(3). For more detail on this utility, refer to thegeniconvtbl(1) and geniconvtbl(4) man pages.
The available fromcode and tocode names that can be applied to iconv(1), iconv_open(3C), and sdtconvtool(1) are shown in the tables in Appendix A, iconv Code Conversions. For more details on iconv code conversion, see the iconv(1), iconv_open(3C), iconv (3) , iconv_close(3C ) geniconvtbl( 1 ) geniconvtbl( 4 ) and sdtconvtool(1) man pages. For more information on available code conversions, see the iconv_en_US.UTF-8(5), iconv(5), iconv_ja(5), iconv_ko(5), iconv_zh(5), and iconv_zh_TW(5) man pages. Also see Appendix A, iconv Code Conversions.
Note - UCS-2, UCS-4, UTF-16 and UTF-32 are all Unicode/ ISO/IEC 10646 representation forms that recognize Byte Order Mark (BOM) characters defined in the Unicode 3.1 and ISO/IEC 10646-1:2000 standards if the character appears at the beginning of the character stream. Other forms, like UCS-2BE, UCS-4BE, UTF-16BE, and UTF-32BE are all fixed-width Unicode/ISO/IEC 10646 representation forms that do not recognize the BOM character and also assume big endian byte ordering. Representation forms like UCS-2LE, UCS-4LE, UTF-16LE, and UTF-32LE, on the other hand, assume little endian byte ordering. They also do not recognize the BOM character.
For associated scripts and languages of ISO8859-* and KO18-*, see http://czyborra.com/charsets/iso8869.html.