Intel® X86 Encoder Decoder
|
The basic idea for the ENC2 fast encoder is that there is one encode function per variant of every instruction. The instructions are encoded in 3 encoding spaces (legacy, VEX and EVEX). We need to have different function names for every variation as well. To come up with unique names, ENC2 uses a few function naming conventions. For legacy encoded instructions, we often have 3 variations in 64b mode (2 in other modes) to handle 16-bit, 32-bit and 64-bit operands. Those 3 sizes are usually differentiated with "_o16", "_o32" and "_o64" in the ENC2 function names. Having unique names is complicated as there are often multiple encodings for the same operation in the instruction set. To disambiguate alias encodings the same function names include substring "_vrN" where N is a integer. Similarly, VEX and EVEX encodings for related instructions often need to be distinguished when their instruction name and operands are the same. To accomplish that all ENC2 EVEX encoding functions names contain the substring "_e". The checked interface functions end with "_chk". More...
Data Structures | |
struct | xed_enc2_req_payload_t |
This structure is filled in by the various XED ENC2 functions. More... | |
union | xed_enc2_req_t |
A wrapper for xed_enc2_req_payload_t . More... | |
Functions | |
XED_DLL_EXPORT void | xed_emit_seg_prefix (xed_enc2_req_t *r, xed_reg_enum_t reg) |
Emit a legacy segment prefix byte in to the specified request's output buffer. More... | |
static XED_INLINE xed_uint32_t | xed_enc2_encoded_length (xed_enc2_req_t *r) |
Returns the number of bytes that were used for the encoding. More... | |
XED_DLL_EXPORT void | xed_enc2_error (const char *fmt,...) |
The error handler routine. More... | |
static XED_INLINE void | xed_enc2_req_t_init (xed_enc2_req_t *r, xed_uint8_t *output_buffer) |
Zero out a xed_enc2_req_t structure and set the output pointer. More... | |
XED_DLL_EXPORT void | xed_enc2_set_check_args (xed_bool_t on) |
turn off (or on) argument checking if using the checked encoder interface. More... | |
XED_DLL_EXPORT void | xed_enc2_set_error_handler (xed_user_abort_handler_t *fn) |
Set a function taking a variable-number-of-arguments (stdarg) to handle the errors and die. More... | |
The basic idea for the ENC2 fast encoder is that there is one encode function per variant of every instruction. The instructions are encoded in 3 encoding spaces (legacy, VEX and EVEX). We need to have different function names for every variation as well. To come up with unique names, ENC2 uses a few function naming conventions. For legacy encoded instructions, we often have 3 variations in 64b mode (2 in other modes) to handle 16-bit, 32-bit and 64-bit operands. Those 3 sizes are usually differentiated with "_o16", "_o32" and "_o64" in the ENC2 function names. Having unique names is complicated as there are often multiple encodings for the same operation in the instruction set. To disambiguate alias encodings the same function names include substring "_vrN" where N is a integer. Similarly, VEX and EVEX encodings for related instructions often need to be distinguished when their instruction name and operands are the same. To accomplish that all ENC2 EVEX encoding functions names contain the substring "_e". The checked interface functions end with "_chk".
For instructions that take conventional x86 memory operands, there are 6 functions generated depending on the addressing mode required. The 6 functions are denoted: b, bd8, bd32, bis, bids8, and bisd32 where:
The idea behind having different functions for the different addressing modes is to make the encode functions simpler and more straight-line code. Memory instructions also indicate their effective addressing width with one of "_a16", "_a32" or "_a64" substrings.
The libraries for the ENC2 encoder are built when the "--enc2" switch is included during the build process. There is one set of libraries and headers generated for each supported configuration. Currently Intel® XED ENC2 supports 64b mode with 64b addressing (m64,a64) and 32b mode with 32b addressing (m32,a32). The build process creates an enc2-m64-a64 directory and an enc2-m32-a32 directory, each with two libraries for the checked and unchecked interfaces. There are 2 headers as well, one for each version of each library in the hdr/xed subdirectory of their respective enc2-* directory. On linux, for a static build, you'd see:
Given the large size of the generated ENC2 headers, doxygen documentation is not created for those header files. Please view the headers directly in your editor.
Even with the unchecked interface, some register checking is done for the addressing registers. In the x86 encoding system, some choices of base register require that an 8-bit or 32-bit displacement is also used. In those cases, the ENC2 encoder is capable of supplying a zero-valued displacement.
Intel® XED also offers the capability to test ENC2 with either the "--enc2-test-checked" flag or the "--enc2-operands-checked" flag. Building XED with any of these flags consequently leads to a longer build. The former flag allows developers to test the ENC2 checked interface in a more sparing matter, where each instruction is then decoded and its IFORM gets validated. The latter flag offers a more rigid testing. Each instruction is decoded and then its IFORM and all operands involved in the encoding get validated as well.
Users can install their own error handler by calling xed_enc2_set_error_handler() passing a function pointer that takes stdarg variable arguments. See examples/xed-enc2-2.c for an example.
When using the checked interface, one can disable the checking at runtime by calling xed_enc2_set_check_args() with an integer value 0. With a nonzero argument, the argument checking can be re-enabled.
To minimize copying, ENC2 users are required to supply a pointer to an output buffer where the encoding bytes will be placed. That buffer is required to be 15 bytes in length. Valid x86 encodings are shorter than 15 bytes and only reach that length if redundant legacy prefixes are employed. XED ENC2 does not generate redundant legacy prefixes.
Here is an example of creating an LEA instruction using the checked interface and several fixed registers:
The call to xed_enc2_req_t_init() zeros out the request structure and sets up the pointer to the output buffer. It is very important to zero the request structure before using it as much of the ENC2 code is optimized to not set zero-valued bits to zero. The call to xed_enc2_encoded_length() returns the number of bytes placed in the output buffer. Getting the length of the encoding is useful for setting the correct buffer pointer for subsequent encoder requests.
See examples/xed-enc2-1.c and examples/xed-enc2-2.c for examples.
XED_DLL_EXPORT void xed_emit_seg_prefix | ( | xed_enc2_req_t * | r, |
xed_reg_enum_t | reg | ||
) |
Emit a legacy segment prefix byte in to the specified request's output buffer.
|
static |
Returns the number of bytes that were used for the encoding.
XED_DLL_EXPORT void xed_enc2_error | ( | const char * | fmt, |
... | |||
) |
The error handler routine.
This function is called by encoder functions upon detecting argument errors. It fist attempts to call the user-registered handler (configured by xed_enc2_set_error_handler() ), or if no user handler is set, then this function calls printf() and then abort(). If the user handler returns, abort() is still called.
|
static |
Zero out a xed_enc2_req_t structure and set the output pointer.
Required before calling and any ENC2 encoding function.
XED_DLL_EXPORT void xed_enc2_set_check_args | ( | xed_bool_t | on | ) |
turn off (or on) argument checking if using the checked encoder interface.
values 1, 0
XED_DLL_EXPORT void xed_enc2_set_error_handler | ( | xed_user_abort_handler_t * | fn | ) |
Set a function taking a variable-number-of-arguments (stdarg) to handle the errors and die.
The argument are like printf with a format string followed by a varaible number of arguments.