Each iconv code conversion definition starts with CONVERSION_NAME followed by one or more semi-colon separated code conversion
definition elements:
|
// a US-ASCII to ISO8859-1 iconv code conversion example:
US-ASCII%ISO8859-1 {
// one or more code conversion definition elements here.
:
:
}
|
Each code conversion definition element can be any one of the following
elements:
|
direction
condition
operation
map
|
To have a meaningful code conversion, there should be at least one
direction, operation, or map element in the iconv code conversion definition.
The direction element contains one or more semi-colon separated condition-action
pairs that direct the code conversion:
|
direction For_US-ASCII_2_ISO8859-1 {
// one or more condition-action pairs here.
:
:
}
|
Each condition-action pair contains a conditional code conversion
that consists of a condition element and an action element.
condition action
If the pre-defined condition is met, the corresponding action is executed.
If there is no pre-defined condition met, iconv(3C) will return -1 with errno set to EILSEQ. The condition
can be a condition element, a name to a pre-defined condition element, or
a condition literal value, true. The 'true' condition literal value always
yields success and thus the corresponding action is always executed. The
action also can be an action element or a name to a pre-defined action element.
The condition element specifies one or more condition expression elements.
Since each condition element can have a name and also can exist stand-alone,
a pre-defined condition element can be referenced by the name at any action
pairs later. To be used in that way, the corresponding condition element
should be defined beforehand:
|
condition For_US-ASCII_2_ISO8859-1 {
// one or more condition expression elements here.
:
:
}
|
The name of the condition element in the above example is For_US-ASCII_2_ISO8859-1. Each condition element can have one
or more condition expression elements. If there are more than one condition
expression elements, the condition expression elements are checked from
top to bottom to see if any one of the condition expression elements will
yield a true. Any one of the following can be a condition expression element:
|
between
escapeseq
expression
|
The between condition expression element defines one or more comma-separated
ranges:
|
between 0x0...0x1f, 0x7f...0x9f ;
between 0xa1a1...0xfefe ;
|
In the first expression in the example above, the covered ranges are 0x0 to 0x1f and 0x7f to 0x9f inclusively. In the second expression, the covered range
is the range whose first byte is 0xa1 to 0xfe and whose second byte is between 0xa1 to 0xfe. This means that the range is defined by each byte. In this
case, the sequence 0xa280 does not meet the range.
The escapeseq condition expression element defines an equal-to condition
for one or more comma-separated escape sequence designators:
|
// ESC $ ) C sequence:
escapeseq 0x1b242943;
// ESC $ ) C sequence or ShiftOut (SO) control character code, 0x0e:
escapeseq 0x1b242943, 0x0e;
|
The expression can be any one of the following and can be surrounded
by a pair of parentheses, '(' and ')':
|
// HEXADECIMAL:
0xa1a1
// DECIMAL
12
// A boolean value, true:
true
// A boolean value, false:
false
// Addition expression:
1 + 2
// Subtraction expression:
10 - 3
// Multiplication expression:
0x20 * 10
// Division expression:
20 / 10
// Remainder expression:
17 % 3
// Left-shift expression:
1 << 4
// Right-shift expression:
0xa1 >> 2
// Bitwise OR expression:
0x2121 | 0x8080
// Exclusive OR expression:
0xa1a1 ^ 0x8080
// Bitwise AND expression:
0xa1 & 0x80
// Equal-to expression:
0x10 == 16
// Inequality expression:
0x10 != 10
// Less-than expression:
0x20 < 25
// Less-than-or-equal-to expression:
10 <= 0x10
// Bigger-than expression:
0x10 > 12
// Bigger-than-or-equal-to expression:
0x10 >= 0xa
// Logical OR expression:
0x10 || false
// Logical AND expression:
0x10 && false
// Logical negation expression:
! false
// Bitwise complement expression:
~0
// Unary minus expression:
-123
|
There is a single type available in this expression: integer. The
boolean values are two special cases of integer values. The 'true' boolean
value's integer value is 1 and the 'false' boolean value's
integer value is 0. Also, any integer value other than 0 is a true boolean value. Consequently, the integer value 0 is the false boolean value. Any boolean expression yields integer
value 1 for true and integer value 0
for false as the result.
Any literal value shown at the above expression examples as operands,
that is, DECIMAL, HEXADECIMAL, and boolean values, can be replaced with
another expression. There are a few other special operands that you can
use as well in the expressions: 'input', 'inputsize', 'outputsize', and variables. input is a keyword pointing to the current input buffer. inputsize is a keyword pointing to the current input buffer size
in bytes. outputsize is a keyword pointing to the current
output buffer size in bytes. The NAME lexical convention is used to name
a variable. The initial value of a variable is 0. The
following expressions are allowed with the special operands:
|
// Pointer to the third byte value of the current input buffer:
input[2]
// Equal-to expression with the 'input':
input == 0x8020
// Alternative way to write the above expression:
0x8020 == input
// The size of the current input buffer size:
inputsize
// The size of the current output buffer size:
outputsize
// A variable:
saved_second_byte
// Assignment expression with the variable:
saved_second_byte = input[1]
|
The input keyword without index value can be used
only with the equal-to operator, '=='. When used in that way, the current
input buffer is consecutively compared with another operand byte by byte.
An expression can be another operand. If the input keyword
is used with an index value n, it is a pointer
to the (n+1)th byte from the beginning of the
current input buffer. An expression can be the index. Only a variable can
be placed on the left hand side of an assignment expression.
The action element specifies an action for a condition and can be
any one of the following elements:
The operation element specifies one or more operation expression elements:
|
operation For_US-ASCII_2_ISO8859-1 {
// one or more operation expression element definitions here.
:
:
}
|
If the name of the operation element, in the case of the above example, For_US -ASCII_2_ISO8859-1, is either init or reset, it defines the initial operation and the reset operation
of the iconv code conversion:
|
// The initial operation element:
operation init {
// one or more operation expression element definitions here.
:
:
}
// The reset operation element:
operation reset {
// one or more operation expression element definitions here.
:
:
}
|
The initial operation element defines the operations that need to
be performed in the beginning of the iconv code conversion. The reset operation
element defines the operations that need to be performed when a user of
the iconv(3) function requests a state reset of the iconv code conversion.
For more detail on the state reset, refer to iconv(3C).
The operation expression can be any one of the following three different
expressions and each operation expression should be separated by an ending
semicolon:
|
if-else operation expression
output operation expression
control operation expression
|
The if-else operation expression makes a selection depend on the boolean
expression result. If the boolean expression result is true, the true task
that follows the 'if' is executed. If the boolean expression yields false
and if a false task is supplied, the false task that follows the 'else'
is executed. There are three different kinds of if-else operation expressions:
|
// The if-else operation expression with only true task:
if (expression) {
// one or more operation expression element definitions here.
:
:
}
// The if-else operation expression with both true and false
// tasks:
if (expression) {
// one or more operation expression element definitions here.
:
:
} else {
// one or more operation expression element definitions here.
:
:
}
// The if-else operation expression with true task and
// another if-else operation expression as the false task:
if (expression) {
// one or more operation expression element definitions here.
:
:
} else if (expression) {
// one or more operation expression element definitions here.
:
:
} else {
// one or more operation expression element definitions here.
:
:
}
|
The last if-else operation expression can have another if-else operation
expression as the false task. The other if-else operation expression can
be any one of above three if-else operation expressions.
The output operation expression saves the right hand side expression
result to the output buffer:
|
// Save 0x8080 at the output buffer:
output = 0x8080;
|
If the size of the output buffer left is smaller than the necessary
output buffer size resulting from the right hand side expression, the iconv
code conversion will stop with E2BIG errno and (size_t)-1 return value to indicate that the code conversion needs
more output buffer to complete. Any expression can be used for the right
hand side expression. The output buffer pointer will automatically move
forward appropriately once the operation is executed.
The control operation expression can be any one of the following expressions:
|
// Return (size_t)-1 as the return value with an EINVAL errno:
error;
// Return (size_t)-1 as the return value with an EBADF errno:
error 9;
// Discard input buffer byte operation. This discards a byte from
// the current input buffer and move the input buffer pointer to
// the 2'nd byte of the input buffer:
discard;
// Discard input buffer byte operation. This discards
// 10 bytes from the current input buffer and move the input
// buffer pointer to the 11'th byte of the input buffer:
discard 10;
// Return operation. This stops the execution of the current
// operation:
return;
// Operation execution operation. This executes the init
// operation defined and sets all variables to zero:
operation init;
// Operation execution operation. This executes the reset
// operation defined and sets all variables to zero:
operation reset;
// Operation execution operation. This executes an operation
// defined and named 'ISO8859_1_to_ISO8859_2':
operation ISO8859_1_to_ISO8859_2;
// Direction operation. This executes a direction defined and
// named 'ISO8859_1_to_KOI8_R:
direction ISO8859_1_to_KOI8_R;
// Map execution operation. This executes a mapping defined
// and named 'Map_ISO8859_1_to_US_ASCII':
map Map_ISO8859_1_to_US_ASCII;
// Map execution operation. This executes a mapping defined
// and named 'Map_ISO8859_1_to_US_ASCII' after discarding
// 10 input buffer bytes:
map Map_ISO8859_1_to_US_ASCII 10;
|
In case of init and reset operations, if there is no pre-defined init
and/or reset operations in the iconv code conversions, only system-defined
internal init and reset operations will be executed. The execution of the
system-defined internal init and reset operations will clear the system-maintained
internal state.
There are three special operators that can be used in the operation:
|
printchr expression;
printhd expression;
printint expression;
|
The above three operators will print out the given expression as a
character, a hexadecimal number, and a decimal number, respectively, at
the standard error stream. These three operators are for debugging purposes
only and should be removed from the final version of the iconv code conversion
definition file.
In addition to the above operations, any valid expression separated
by a semi-colon can be an operation, including an empty operation, denoted
by a semi-colon alone as an operation.
The map element specifies a direct code conversion mapping by using
one or more map pairs. When used, usually many map pairs are used to represent
an iconv code conversion definition:
|
map For_US-ASCII_2_ISO8859-1 {
// one or more map pairs here
:
:
}
|
Each map element also can have one or two comma-separated map attribute
elements like the following examples:
|
// Map with densely encoded mapping table map type:
map maptype = dense {
// one or more map pairs here
:
:
}
// Map with hash mapping table map type with hash factor 10.
// Only hash mapping table map type can have hash factor. If
// the hash factor is specified with other map types, it will be
// ignored.
map maptype = hash : 10 {
// one or more map pairs here.
:
:
}
// Map with binary search tree based mapping table map type:
map maptype = binary {
// one more more map pairs here.
:
:
}
// Map with index table based mapping table map type:
map maptype = index {
// one or more map pairs here.
:
:
}
// Map with automatic mapping table map type. If defined,
// system will assign the best possible map type.
map maptype = automatic {
// one or more map pairs here.
:
:
}
// Map with output_byte_length limit set to 2.
map output_byte_length = 2 {
// one or more map pairs here.
:
:
}
// Map with densely encoded mapping table map type and
// output_bute_length limit set to 2:
map maptype = dense, output_byte_length = 2 {
// one or more map pairs here.
:
:
}
|
If no maptype is defined, automatic is assumed. If no output_byte_length
is defined, the system figures out the maximum possible output byte length
for the mapping by scanning all the possible output values in the mappings.
If the actual output byte length scanned is bigger than the defined output_byte_length,
the geniconvtbl utility issues an error and stops generating
the code conversion binary table(s).
The following are allowed map pairs:
|
// Single mapping. This maps an input character denoted by
// the code value 0x20 to an output character value 0x21:
0x20 0x21
// Multiple mapping. This maps 128 input characters to 128
// output characters. In this mapping, 0x0 maps to 0x10, 0x1 maps
// to 0x11, 0x2 maps to 0x12, ..., and, 0x7f maps to 0x8f:
0x0...0x7f 0x10
// Default mapping. If specified, every undefined input character
// in this mapping will be converted to a specified character
// (in the following case, a character with code value of 0x3f):
default 0x3f;
// Default mapping. If specified, every undefined input character
// in this mapping will not be converted but directly copied to
// the output buffer:
default no_change_copy;
// Error mapping. If specified, during the code conversion,
// if input buffer contains the byte value, in this case, 0x80,
// the iconv(3) will stop and return (size_t)-1 as the return
// value with EILSEQ set to the errno:
0x80 error;
|
If no default mapping is specified, every undefined input character
in the mapping will be treated as an error mapping. and thus the iconv(3C) will
stop the code conversion and return (size_t)-1 as the
return value with EILSEQ set to the errno.
The syntax of the iconv code conversion definition in extended BNF is illustrated below:
|
iconv_conversion_definition
: CONVERSION_NAME '{' definition_element_list '}'
;
definition_element_list
: definition_element ';'
| definition_element_list definition_element ';'
;
definition_element
: direction
| condition
| operation
| map
;
direction
: 'direction' NAME '{' direction_unit_list '}'
| 'direction' '{' direction_unit_list '}'
;
direction_unit_list
: direction_unit
| direction_unit_list direction_unit
;
direction_unit
: condition action ';'
| condition NAME ';'
| NAME action ';'
| NAME NAME ';'
| 'true' action ';'
| 'true' NAME ';'
;
action
: direction
| map
| operation
;
condition
: 'condition' NAME '{' condition_list '}'
| 'condition' '{' condition_list '}'
;
condition_list
: condition_expr ';'
| condition_list condition_expr ';'
;
condition_expr
: 'between' range_list
| expr
| 'escapeseq' escseq_list ';'
;
range_list
: range_pair
| range_list ',' range_pair
;
range_pair
: HEXADECIMAL '...' HEXADECIMAL
;
escseq_list
: escseq
| escseq_list ',' escseq
;
escseq : HEXADECIMAL
;
map : 'map' NAME '{' map_list '}'
| 'map' '{' map_list '}'
| 'map' NAME map_attribute '{' map_list '}'
| 'map' map_attribute '{' map_list '}'
;
map_attribute
: map_type ',' 'output_byte_length' '=' DECIMAL
| map_type
| 'output_byte_length' '=' DECIMAL ',' map_type
| 'output_byte_length' '=' DECIMAL
;
map_type: 'maptype' '=' map_type_name : DECIMAL
| 'maptype' '=' map_type_name
;
map_type_name
: 'automatic'
| 'index'
| 'hash'
| 'binary'
| 'dense'
;
map_list
: map_pair
| map_list map_pair
;
map_pair
: HEXADECIMAL HEXADECIMAL
| HEXADECIMAL '...' HEXADECIMAL HEXADECIMAL
| 'default' HEXADECIMAL
| 'default' 'no_change_copy'
| HEXADECIMAL 'error'
;
operation
: 'operation' NAME '{' op_list '}'
| 'operation' '{' op_list '}'
| 'operation' 'init' '{' op_list '}'
| 'operation' 'reset' '{' op_list '}'
;
op_list : op_unit
| op_list op_unit
;
op_unit : ';'
| expr ';'
| 'error' ';'
| 'error' expr ';'
| 'discard' ';'
| 'discard' expr ';'
| 'output' '=' expr ';'
| 'direction' NAME ';'
| 'operation' NAME ';'
| 'operation' 'init' ';'
| 'operation' 'reset' ';'
| 'map' NAME ';'
| 'map' NAME expr ';'
| op_if_else
| 'return' ';'
| 'printchr' expr ';'
| 'printhd' expr ';'
| 'printint' expr ';'
;
op_if_else
: 'if' '(' expr ')' '{' op_list '}'
| 'if' '(' expr ')' '{' op_list '}' 'else' op_if_else
| 'if' '(' expr ')' '{' op_list '}' 'else' '{' op_list '}'
;
expr : '(' expr ')'
| NAME
| HEXADECIMAL
| DECIMAL
| 'input' '[' expr ']'
| 'outputsize'
| 'inputsize'
| 'true'
| 'false'
| 'input' '==' expr
| expr '==' 'input'
| '!' expr
| '~' expr
| '-' expr
| expr '+' expr
| expr '-' expr
| expr '*' expr
| expr '/' expr
| expr '%' expr
| expr '<<' expr
| expr '>>' expr
| expr '|' expr
| expr '^' expr
| expr '&' expr
| expr '==' expr
| expr '!=' expr
| expr '>' expr
| expr '>=' expr
| expr '<' expr
| expr '<=' expr
| NAME '=' expr
| expr '||' expr
| expr '&&' expr
;
|
|