This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| en:multiasm:papc:chapter_6_7 [2025/10/23 14:52] – [BMI1 and BMI2 Instructions] ktokarz | en:multiasm:papc:chapter_6_7 [2026/02/19 20:52] (current) – [MOV] ktokarz | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | ====== Instruction Set of x86 - Essentials ====== | ||
| + | ===== Instruction groups ===== | ||
| + | The x64 processors can execute an extensive number of different instructions. In the documentation of processors, we can find several ways of dividing all instructions into groups. The most general division, according to AMD, defines five groups of instructions: | ||
| + | * General Purpose instructions | ||
| + | * System instructions | ||
| + | * SSE instructions | ||
| + | * 64-bit media instructions | ||
| + | * x87 Floating-Point instructions | ||
| + | |||
| + | Intel defines the following groups of instructions. | ||
| + | * General Purpose | ||
| + | * X87 FPU | ||
| + | * X87 FPU and SIMD State Management | ||
| + | * MMX Technology | ||
| + | * SSE Extensions | ||
| + | * SSE2 Extensions | ||
| + | * SSE3 Extensions | ||
| + | * SSSE3 Extensions | ||
| + | * IA-32e mode: 64-bit mode instructions | ||
| + | * System Instructions | ||
| + | * VMX Instructions | ||
| + | * SMX Instructions | ||
| + | |||
| + | There is also a long list of extensions defined, including SSE4.1, SSE4.2, Intel AVX, AMD 3DNow! and many others. For a detailed description of instruction groups, please refer to | ||
| + | * "AMD64 Architecture Programmer’s Manual" | ||
| + | * " | ||
| + | |||
| + | Details of every instruction you can find in the description of the instruction set | ||
| + | * "AMD64 Architecture Programmer’s Manual Volume 3: General Purpose and System Instructions" | ||
| + | * " | ||
| + | |||
| + | There are also specialised websites with detailed explanations of instructions that you can use to get a lot of additional information. Among others, you can visit: | ||
| + | * X86 Opcode and Instruction Reference ((http:// | ||
| + | * x86 and amd64 instruction reference ((https:// | ||
| + | |||
| + | In this book, we will present most of the general-purpose instructions and provide general ideas on the chosen extensions, including FPU, MMX, SSE, and AVX. | ||
| + | |||
| + | ===== General Purpose Instructions ===== | ||
| + | General-purpose instructions can be divided into some subgroups. | ||
| + | * Data Transfer Instructions | ||
| + | * Binary Arithmetic Instructions | ||
| + | * Decimal Arithmetic Instructions | ||
| + | * Logical Instructions | ||
| + | * Shift and Rotate Instructions | ||
| + | * Bit and Byte Instructions | ||
| + | * Control Transfer Instructions | ||
| + | * String Instructions | ||
| + | * I/O Instructions | ||
| + | * Enter and Leave Instructions | ||
| + | * Flag Control (EFLAG) Instructions | ||
| + | * Segment Register Instructions | ||
| + | * Miscellaneous Instructions | ||
| + | * User Mode Extended State Save/ | ||
| + | * Random Number Generator Instructions | ||
| + | * BMI1 and BMI2 Instructions | ||
| + | |||
| + | ===== Condition Codes ===== | ||
| + | Before describing instructions, | ||
| + | Condition codes together with flags checked are presented in table {{ref> | ||
| + | |||
| + | <table table_condition_codes> | ||
| + | < | ||
| + | ^ Condition code // | ||
| + | | E | ZF = 1 | Equal | | ||
| + | | Z | ZF = 1 | Zero | | ||
| + | | NE | ZF = 0 | Not equal | | ||
| + | | NZ | ZF = 0 | Not zero | | ||
| + | | A | CF=0 and ZF=0 | Above | | ||
| + | | NBE | CF=0 and ZF=0 | Not below or equal | | ||
| + | | AE | CF=0 | Above or equal | | ||
| + | | NB | CF=0 | Not below | | ||
| + | | B | CF=1 | Below | | ||
| + | | NAE | CF=1 | Not above or equal | | ||
| + | | BE | CF=1 or ZF=1 | Below or equal | | ||
| + | | NA | CF=1 or ZF=1 | Not above | | ||
| + | | G | ZF=0 and SF=OF | Greater | | ||
| + | | NLE | ZF=0 and SF=OF | Not less or equal | | ||
| + | | GE | SF=OF | Greater or equal | | ||
| + | | NL | SF=OF | Not less | | ||
| + | | L | SF<> | ||
| + | | NGE | SF<> | ||
| + | | LE | ZF=1 or SF<> | ||
| + | | NG | ZF=1 or SF<> | ||
| + | | C | CF=1 | Carry | | ||
| + | | NC | CF=0 | Not carry | | ||
| + | | O | OF=1 | Overflow | | ||
| + | | NO | OF=0 | Not ovrflow | | ||
| + | | S | SF=1 | Sign (negative) | | ||
| + | | NS | SF=0 | Not sign (non-negative) | | ||
| + | | P | PF=1 | Parity | | ||
| + | | PE | PF=1 | Parity even | | ||
| + | | NP | PF=0 | Not parity | | ||
| + | | PO | PF=0 | Parity odd | | ||
| + | </ | ||
| + | ===== Data transfer instructions ===== | ||
| + | Almost all assembler tutorials start with the presentation of the **mov** instruction, | ||
| + | ==== MOV ==== | ||
| + | Let's look at some additional variants. | ||
| + | <code asm> | ||
| + | mov al, bl ;copy one byte from bl to al | ||
| + | mov ax, bx ;copy word (two bytes) from bx to ax | ||
| + | mov eax, ebx ;copy doublweword (four bytes) from ebx to eax | ||
| + | mov rax, rbx ;copy quadword (eight bytes) from rbx to eax | ||
| + | </ | ||
| + | In the **mov** instruction, | ||
| + | <code asm> | ||
| + | mov al, 100 ;0xB0, 0x64 | ||
| + | ;copy constant (immediate) of the value 100 (0x64) to al | ||
| + | |||
| + | mov al, [bx] ; | ||
| + | ;copy byte from the memory at address stored in bx to al | ||
| + | ; | ||
| + | |||
| + | ;Notice the difference between two following instructions | ||
| + | mov eax, 100 ; | ||
| + | ;copy constant 100 to eax | ||
| + | |||
| + | mov eax, [100] ; | ||
| + | ;copy value from memory at address 100 | ||
| + | |||
| + | ;It is possible to copy a constant to memory addressed directly or indirectly | ||
| + | ;operand size specifier dword ptr is required | ||
| + | ;to inform the processor about the size of the argument | ||
| + | mov dword ptr ds:[200], 100 | ||
| + | ; | ||
| + | ;copy value of 100, encoded as dword (four bytes), 0x64 = 100 | ||
| + | ;to memory at address 200, encoded as four bytes, | ||
| + | | ||
| + | mov dword ptr [ebx], 100 | ||
| + | ; | ||
| + | ;copy value of 100, encoded as dword (four bytes), 0x64 = 100 | ||
| + | ;to memory addressed by ebx | ||
| + | </ | ||
| + | ==== Conditional move ==== | ||
| + | Starting from the P6 machines, the conditional move instruction **cmov// | ||
| + | For example, if we need to copy data from ebx to ecx, if the result of the previous operation is negative, we can write the following instruction. | ||
| + | <code asm> | ||
| + | cmovs ecx, ebx | ||
| + | </ | ||
| + | |||
| + | ==== Sign extension ==== | ||
| + | In the situation of copying data of a smaller size (expressed in number of bits) to a bigger destination argument, the question arises as to what to do with the remaining bits. Let us consider copying an 8-bit value from bl to the 16-bit ax register. If the value copied is unsigned or positive (let it be 5), the remaining bits should be cleared. | ||
| + | <code asm> | ||
| + | ; | ||
| + | mov al, bl ; 00000101 | ||
| + | mov ah, 0 ; | ||
| + | ; | ||
| + | </ | ||
| + | |||
| + | If the value is negative (e.g. -5) the situation changes. | ||
| + | <code asm> | ||
| + | ; | ||
| + | mov al, bl ; 11111011 | ||
| + | mov ah, 0 ; | ||
| + | ; | ||
| + | </ | ||
| + | |||
| + | It is visible that to preserve the original value, the upper bits must be filled with ones, not zeros. | ||
| + | <code asm> | ||
| + | ; | ||
| + | mov al, bl ; 11111011 | ||
| + | mov ah, 0xFF ;11111111 | ||
| + | ; | ||
| + | </ | ||
| + | |||
| + | There are special instructions which perform automatic sign extension, copying the sign bit to all higher bit positions. They can be considered as type conversion instructions. These instructions do not have any arguments as they operate on the accumulator only. | ||
| + | * **cbw** - converts byte in al to word in ax | ||
| + | * **cwd** - converts word in ax to doubleword in dx:ax | ||
| + | * **cwde** - converts word in ax to doubleword extended in eax | ||
| + | * **cdq** - converts doubleword in eax to quadword in edx:eax | ||
| + | * **cdqe** - convert doubleword in eax to quadword in rax | ||
| + | * **cqo** - convert quadword in rax to double quadword in rdx:rax | ||
| + | |||
| + | Sign extension instructions work solely with the accumulator. Fortunately, | ||
| + | * **movsx** - copies and sign-extends a byte to a word or doubleword or word to doubleword. | ||
| + | * **movzx** - copies and zero-extends a byte to a word or doubleword or word to doubleword. | ||
| + | * **movsxd** - copies and extends a doubleword to quadword in x64 processors. | ||
| + | |||
| + | ==== Exchange instructions ==== | ||
| + | The exchange instructions swap the values of operands. A single exchange instruction can replace three mov instructions while swapping the contents of two arguments, so they can be useful in optimising some algorithms. They are helpful in the implementation of semaphores, even in multiprocessor systems. | ||
| + | The **xchg** instruction swaps the values of two arguments. If one of the arguments is in memory, the instruction behaves as with the LOCK prefix, allowing for semaphore implementation. | ||
| + | The **cmpxchg** has three arguments: source, destination and accumulator. It compares the destination argument with the accumulator; | ||
| + | <figure instr_cmpxchg> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| + | The **xadd** instruction exchanges two arguments, adds them, and stores the sum in a destination argument. Together with a LOCK prefix, it can be used to implement a DO loop executed by more than one processor simultaneously. | ||
| + | |||
| + | The **bswap** instruction is a single-argument instruction; | ||
| + | <figure instr_bswap> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| + | ==== Stack instructions ==== | ||
| + | A stack is a special structure in the memory that automatically stores the return address (address of the next instruction) while procedure calling (it is described in detail in the section about the **call** instruction). It is also possible to use the stack for local variables in functions, to pass arguments to procedures, and for temporal data storage. In x86 architecture, | ||
| + | <figure instr_push> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| + | |||
| + | The **pop** instruction takes data off the stack, copies it into the destination argument, and increments the stack pointer. After its execution, the stack pointer points to the previous data stored on the stack. It is shown in figure {{ref> | ||
| + | <figure instr_pop> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| + | There are also instructions that push or pop all eight general-purpose registers (including the stack pointer). The 16-bit registers are pushed with **pusha** and popped with **popa** instructions. For 32-bit registers, the **pushad** and **popad** instructions can be used, respectively. The order of registers on the stack is shown in figure {{ref> | ||
| + | <figure instr_pushadpopad> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| + | |||
| + | ===== Arithmetic instructions ===== | ||
| + | Arithmetic instructions perform calculations on binary encoded data. It is worth noting that the processor does not distinguish between unsigned and signed values; it is the responsibility of the programming engineer to provide correct input values and properly interpret the results obtained. | ||
| + | < | ||
| + | There are instructions which support decimal arithmetic, but due to the rare use of BCD numbers in modern software, they are not available in x64 mode. | ||
| + | </ | ||
| + | ==== Addition and subtraction ==== | ||
| + | There are two adding instructions. The **add** adds two values from the destination and source arguments and stores the result in the destination argument. It modifies the flags in the EFLAG register according to the result. The **adc** instruction additionally adds " | ||
| + | Similarly, there are two subtraction instructions. The **sub** subtracts the source argument from the destination argument, stores the result in the destination, | ||
| + | |||
| + | ==== Incrementation and decrementation ==== | ||
| + | The **inc** instruction adds " | ||
| + | |||
| + | ==== Multiply ==== | ||
| + | Two multiply instructions are implemented. The **mul** is a one-argument instruction. It multiplies the content of the argument and the accumulator, | ||
| + | |||
| + | <table table_mul> | ||
| + | < | ||
| + | ^ Argument ^ Accumulator ^ Result ^ | ||
| + | | 8 bits | AL | AX | | ||
| + | | 16 bits | AX | DX:AX | | ||
| + | | 32 bits | EAX | EDX:EAX | | ||
| + | | 64 bits | RAX | RDX:RAX | | ||
| + | </ | ||
| + | |||
| + | The **imul** instruction implements the signed multiply. It can have one, two or three arguments. The single-argument version behaves the same way as the **mul** instruction. The two-argument version multiplies the 16-, 32-, or 64-bit register as the destination operand by the argument of the same size. The three-argument version multiplies the content of the source argument by the immediate and stores the result in the destination of the same size as the arguments. The destination must be the register. | ||
| + | |||
| + | ==== Divide ==== | ||
| + | Two divide instructions are implemented. The **div** is a one-argument instruction. It divides the content of the accumulator by the argument, treated as unsigned numbers. The size of the accumulator is twice as big as the size of the argument. The result is stored as two integer values of the same size as the argument. The quotient is placed in the lower half of the accumulator, | ||
| + | |||
| + | <table table_div> | ||
| + | < | ||
| + | ^ Argument ^ Accumulator ^ Quotient ^ Remainder ^ | ||
| + | | 8 bits | AX | AL | AH | | ||
| + | | 16 bits | DX:AX | AX | DX | | ||
| + | | 32 bits | EDX: | ||
| + | | 64 bits | RDX: | ||
| + | </ | ||
| + | |||
| + | The **idiv** instruction implements the signed divide. It behaves the same way as the **div** instruction except for the type of numbers. | ||
| + | ===== Logical instructions ===== | ||
| + | The set of logical instructions contains **and**, **or**, **xor** and **not** instructions. All of them perform bitwise Boolean operations corresponding to their names. The **not** is a single-argument instruction; | ||
| + | |||
| + | ===== Shift and rotate instructions ===== | ||
| + | Shift and rotate instructions treat the argument as the shift register. Each bit of the argument is moved to the neighbour position on the left or right, depending on the shift direction. The number of bit positions for the shift can be specified as a constant or in the CX register. Shift instructions can be used for multiplying (shift left) and dividing (shift right) by a power of two. | ||
| + | Shift instructions have two versions: logical and arithmetical. Logical shift left **shl** and arithmetical shift left **sal** behave the same, filling the empty bits (at the LSB position) with zeros. Logical shift right **shr** fills the empty bits (at the MSB position) with zeros, while the arithmetical shift right **sar** makes a copy of the most significant bit, preserving the sign of a value. It is shown in figure {{ref> | ||
| + | |||
| + | <figure instr_shift> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| + | |||
| + | There are two double shift instructions which move bits from the source argument to the destination argument. The number of bits is specified as the third argument. Shift double right has **shrd** mnemonic, while shift double left has **shld** mnemonic. The operation of shift double instructions is presented in figure {{ref> | ||
| + | |||
| + | <figure instr_shiftdouble> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| + | |||
| + | For all shift instructions, | ||
| + | |||
| + | Rotate instructions shift bits left **rol** or right **ror** in the argument, and additionally move bits around from the lowest to the highest or from the highest to the lowest position. Behaviour of rotate instructions is shown in figure {{ref> | ||
| + | <figure instr_rotate> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| + | |||
| + | Rotate through carry left **rcl** and right **rcr**, treat the carry flag as the additional bit while rotating. They can be used to collect bits to form multi-bit data. Behaviour of rotate with carry instructions is shown in figure {{ref> | ||
| + | <figure instr_rotatec> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| + | ===== Bit and Byte Instructions ===== | ||
| + | Bit test instruction **bt** makes a copy of the selected bit in the carry flag. The bit for testing is specified by a combination of two arguments. The first argument, named the bit base operand, holds the bit. It can be a register or a memory location. The second operand is the bit offset, which specifies the position of the bit operand. It can be a register or an immediate value. It starts counting from 0, so the least significant bit has the position 0. An example of the behaviour of the **bt** instruction is shown in figure {{ref> | ||
| + | <figure instr_bt> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| + | |||
| + | Bit test and modify instructions first make a copy of the selected bit, and next modify the original bit value with the one specified by the instruction. The **bts** sets the bit to one, **btr** clears the bit (resets to zero value), **btc** changes the state of the bit to the opposite (complements). | ||
| + | |||
| + | The bit scan instructions search for the first occurrence of the bit of the value 1. The bit scan forward **bsf** scans starting from the least significant bit towards higher bits, bit scan reverse **bsr** starts from the most significant bit towards lower bits. Both instructions return the index of the found bit in the destination register. If there is no bit of the value 1, the zero flag is set, and the destination register value is undefined. | ||
| + | |||
| + | The **test** instruction performs the logical AND function without storing the result. It just modifies flags according to the result of the AND operation. | ||
| + | |||
| + | The **set// | ||
| + | |||
| + | The **popcnt** instruction counts the number of bits equal to " | ||
| + | |||
| + | The **crc32** instruction implements the calculation of the cyclic redundancy check in hardware. The polynomial of the value 11EDC6F41h is fixed. | ||
| + | |||
| + | ===== Control transfer instructions ===== | ||
| + | Before describing the instructions used for control transfer, we will discuss how the destination address can be calculated. The destination address is the address given to the processor to make a jump to. | ||
| + | ==== Near and far transfer ==== | ||
| + | While the segmentation is enabled, the destination address can be given as the offset only or in full logical form. If there is an offset only, the instruction modifies solely the instruction pointer, the jump is performed within the current segment and is called **near**. If the address is provided in full logical form, containing segment and offset parts, the CS and IP registers are modified. Such an instruction can perform a jump between segments and is called **far**. | ||
| + | ==== Absolute and relative address ==== | ||
| + | An **absolute address** is given as a value specifying the destination address as the number of the byte counted from the beginning of the memory, or, if segmentation is enabled, as the offset from the beginning of the segment. A **relative address** is calculated as the difference between the current value of the instruction pointer and the absolute destination address. It is provided in the instructions as the signed number representing the distance between the current and destination addresses. If it is possible to encode the difference as an 8-bit signed value, the jump is called **short**. Usually, an assembler automatically chooses the shortest possible encoding. | ||
| + | ==== Conditional and unconditional control transfer ==== | ||
| + | Conditional transfer instructions check the state of chosen flags in the Flags register and perform the jump to the specified address if the condition gives a true result. If the condition results in false, the processor goes to the next instruction in the instruction stream. Conditions are specified the same way as in **cmov// | ||
| + | ==== Unconditional control transfer instructions ==== | ||
| + | Unconditional control transfer instructions perform the jump to the new address to change the program flow. | ||
| + | The **jmp** instruction jumps to a destination address by putting the destination address in the instruction pointer register. If segmentation is enabled and the destination address is placed in another segment than the current one, it also modifies the CS register. | ||
| + | The **call** instruction is designed to handle subroutines. It also jumps to a destination address, but before putting the new value into the instruction pointer, it pushes the returning address onto the stack. The returning address is the address of the next instruction after the call. This allows the processor to use the returning address later to get back from the subroutine to the main program. | ||
| + | The **ret** instruction forms a pair with the **call**. It uses the information stored on the stack to return from a subroutine. | ||
| + | The process of calling a procedure and returning to the main program is shown in figure {{ref> | ||
| + | <figure procedure_call> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| + | < | ||
| + | In assembler, subroutines are called procedures. In other languages, you can find the names: function (it can return the resulting value), method (in object-oriented languages) or subprogram. | ||
| + | </ | ||
| + | ==== Interrupts ==== | ||
| + | An interrupt mechanism in x86 works with hardware-signalled interrupts or with special interrupt instructions. Return from an interrupt is performed by executing the **iret** instruction. In 32 and 64-bit architectures, | ||
| + | The process of interrupt handler calling and returning to the main program is shown in figure {{ref> | ||
| + | <figure interrupt_x86> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| + | Software interrupts are handled the same way as signalled by the hardware. The **int** instruction signals the interrupt of a given number. There are also some special interrupt instructions. The **int1** and **int3** are one-byte special machine codes used for debugging, **into** signals a software overflow exception if the OF flag is set, and **bound** raises the bound range exceeded exception (int 5) when the tested value is over or under the defined bounds. The last two instructions are not valid in 64-bit mode. | ||
| + | |||
| + | < | ||
| + | In 32 and 64-bit operating systems, the interrupts are handled by the OS and called through the interrupt descriptors, | ||
| + | </ | ||
| + | |||
| + | ==== Conditional control transfer instructions ==== | ||
| + | The **j//cc//** instructions are used to test the state of flags and perform the jump to the destination address if the condition is met. In modern pipelined processors, it is recommended to avoid using conditional jumps if possible, ensuring that the program flows continuously, | ||
| + | <code asm> | ||
| + | cmp var1, 0 ; | ||
| + | jz is_zero | ||
| + | mov rax, " | ||
| + | jmp not_zero | ||
| + | is_zero: | ||
| + | mov rax, " | ||
| + | not_zero: | ||
| + | </ | ||
| + | < | ||
| + | You can try to optimise this code by avoiding jumps. Try to use the conditional **mov** instruction. | ||
| + | </ | ||
| + | |||
| + | ==== Loop instructions ==== | ||
| + | The **loop** instruction is used to implement a loop, which is executed a known number of times. The number of iterations should be set before a loop in the counter register (CX/ | ||
| + | There are also conditional versions of the **loop** instruction, | ||
| + | The **loop** instruction can cause the system to iterate many times if the counter register is zero before entering the loop. As the first step is the decrementing of the counter, it will result in a value composed of all " | ||
| + | |||
| + | <code asm> | ||
| + | lea rbx, table ; | ||
| + | mov rcx, size ;size of a table - we can't ensure it's not zero | ||
| + | xor rdx, rdx ;zero rdx - it will be the sum af elements | ||
| + | jrcxz end_loop | ||
| + | begin_loop: | ||
| + | add rdx, [rbx] ;add the item to the resulting value | ||
| + | inc rbx ;point to another item in a table | ||
| + | loop begin_loop | ||
| + | end_loop: | ||
| + | </ | ||
| + | |||
| + | < | ||
| + | According to the information found on the Internet, the **loop** instructions are not optimised for modern pipelined processors, and are often replaced with compare and conditional jump instructions. | ||
| + | </ | ||
| + | |||
| + | ===== String Instructions ===== | ||
| + | String instructions are developed to perform operations on elements of data tables, including text strings. These instructions can access two elements in memory - source and destination. If segmentation is enabled, the source operand is identified with SI/ESI and placed always in the data segment (DS), the destination operand is identified with DI/EDI and stored in the extended data segment (ES). In 64-bit mode, the source operand is identified with RSI, and the destination operand is identified with RDI. They can operate on bytes, words, doublewords or quadwords. The size of the element is specified as the suffix of the instruction or derived from the size of the arguments specified in the instruction. | ||
| + | |||
| + | ==== String copy ==== | ||
| + | The **movs** instruction copies the element of the source string to the destination string. It requires two arguments of the size of bytes, words, doublewords or quadwords. | ||
| + | The **movsb** instruction copies a byte from the source string to the destination string. | ||
| + | The **movsw** instruction copies a word from the source string to the destination string. | ||
| + | The **movsd** instruction copies a doubleword from the source string to the destination string. | ||
| + | The **movsq** instruction copies a quadword from the source string to the destination string. | ||
| + | < | ||
| + | The locations of the source and destination operands are always accessed with the use of the source and destination index registers, which must be loaded correctly before the string instruction is executed. Arguments, if present, are used to determine the size of the element only. | ||
| + | </ | ||
| + | |||
| + | ==== Store string ==== | ||
| + | These instructions store the content of the accumulator to the destination operand. | ||
| + | The **stos** instruction copies the content of the accumulator to the destination string. It requires one argument of the size of byte, word, doubleword or quadword. | ||
| + | The **stosb** instruction copies a byte from the AL to the destination string. | ||
| + | The **stosw** instruction copies a word from the AX to the destination string. | ||
| + | The **stosd** instruction copies a doubleword from the EAX to the destination string. | ||
| + | The **stosq** instruction copies a quadword from the RAX to the destination string. | ||
| + | |||
| + | ==== Load string ==== | ||
| + | These instructions load the content of the source string to the accumulator. | ||
| + | The **lods** instruction copies the content of the source string to the accumulator. It requires one argument of the size of byte, word, doubleword or quadword. | ||
| + | The **lodsb** instruction copies a byte from the source string to the AL. | ||
| + | The **lodsw** instruction copies a word from the source string to the AX. | ||
| + | The **lodsd** instruction copies a doubleword from the source string to the EAX. | ||
| + | The **lodsq** instruction copies a quadword from the source string to the RAX. | ||
| + | ==== String compare ==== | ||
| + | Strings can be compared, which means that the element of the destination string is compared with the element of the source string. These instructions set the status flags in the flags register according to the result of the comparison. The elements of both strings remain unchanged. | ||
| + | The **cmps** instruction compares the element of a source string with the element of the destination string. It requires two arguments, which specify the size of the data elements. | ||
| + | The **cmpsb** instruction compares a byte from the source string with a byte from the destination string. | ||
| + | The **cmpsw** instruction compares a word from the source string with a word from the destination string. | ||
| + | The **cmpsd** instruction compares a doubleword from the source string with a doubleword from the destination string. | ||
| + | The **cmpsq** instruction compares a quadword from the source string with a quadword from the destination string. | ||
| + | |||
| + | ==== String scan ==== | ||
| + | Strings can be scanned, which means that the element of the destination string is compared with the accumulator. These instructions set the status flags in the flags register according to the result of the comparison. The accumulator and string element remain unchanged. | ||
| + | The **scas** instruction compares the accumulator with the element of the destination string. It requires one argument, which specifies the size of the accumulator and the data element. | ||
| + | The **scasb** instruction compares the AL with a byte from the destination string. | ||
| + | The **scasw** instruction compares the AX with a word from the destination string. | ||
| + | The **scasd** instruction compares the EAX with a doubleword from the destination string. | ||
| + | The **scasq** instruction compares the RAX with a quadword from the destination string. | ||
| + | |||
| + | ==== Repeated string instructions ==== | ||
| + | All string instructions can be preceded by the repetition prefix to automate the processing of multiple-element tables. Use of the prefix enables the instructions to automatically repeat the instruction execution according to the content of the counter register and modify the source and destination addresses in index registers, accordingly to the size of the element. Index registers can be incremented or decremented depending on the direction flag (DF) state. If DF is " | ||
| + | The **rep** prefix allows block copying, storing and loading of an entire string rather than a single element. | ||
| + | The use of repeated string instructions enables copying the entire string from one place in memory to another, or filling up the memory regions with a pattern. | ||
| + | |||
| + | The **repe** or **repz** prefixes additionally test if the zero flag is " | ||
| + | The **repne** or **repnz** prefixes test if the zero flag is " | ||
| + | The conditional prefixes are intended to be used with **scas** or **cmps** instructions. | ||
| + | The use of repeated string instructions with conditional prefixes enables string comparison for equality or differences, | ||
| + | |||
| + | To properly use the repeated string instructions, | ||
| + | - Set the SI/ESI/RSI with the address of the source string. | ||
| + | - Set the DI/EDI/RDI with the address of the destination string. | ||
| + | - Clear of set the DF to determine the direction of string processing - from lower to higher or from higher to lower addresses, respectively. | ||
| + | - Set the counter register CX/ECX/RCX with the number of elements to process | ||
| + | - Execute the string instruction with repetition prefix and suffix according to the size of the element. | ||
| + | ===== I/O Instructions ===== | ||
| + | These instructions allow the processor to transfer data between the accumulator register and a peripheral device. | ||
| + | A peripheral device can be addressed directly or indirectly. Direct addressing uses an 8-bit constant as the peripheral address (named in x86 I/O port), and it accesses only the first 256 port addresses. Indirect addressing uses the DX register as the address register, enabling access to the entire I/O address space of 65536 addresses. | ||
| + | The **in** instruction reads data from a port to the accumulator. The **out** instruction writes the data from the accumulator to the port. The size of the accumulator determines the size of the data to be transferred. It can be AL, AX or EAX. | ||
| + | The I/O instructions also have string versions. Instructions to read the port to a string are **ins**, **insb**, **insw**, and **insd**. Instructions to write a string to a port are **outs**, **outsb**, **outsw**, and **outsd**. In all string I/O instructions, | ||
| + | ===== Enter and Leave Instructions ===== | ||
| + | Enter instruction creates the stack frame for the function. The stack frame is a place on the stack reserved for the function to store arguments and local variables. Traditionally, | ||
| + | < | ||
| + | According to the information on compiler behaviour, the **enter** instruction is never used by compilers, while the **leave** instruction is rarely, but sometimes used. | ||
| + | </ | ||
| + | ===== Flag Control Instructions ===== | ||
| + | Flag control instructions are typically used to set or clear the chosen flag in the RFLAGS register. We can only control three flags directly. The carry (CF) flag can be used in conjunction with the rotate-with-carry instructions to convert the series of bits into a binary-encoded value. The direction (DF) flag determines the direction of modification of index registers RSI and RDI when executing string instructions. If the DF flag is clear, the index registers are incremented; | ||
| + | The summary of instructions is shown in the table {{ref> | ||
| + | <table table_flags_instructions> | ||
| + | < | ||
| + | ^ Instruction ^ Behavoiur ^ flag affected ^ | ||
| + | | **stc** | set carry flag | CF=1 | | ||
| + | | **clc** | clear carry flag | CF=0 | | ||
| + | | **cmc** | complement carry flag | CF=not CF | | ||
| + | | **std** | set direction flag | DF=1 | | ||
| + | | **cld** | clear direction flag | DF=0 | | ||
| + | | **sti** | set interrupt flag | IF=1 | | ||
| + | | **cli** | clear interrupt flag | IF=0 | | ||
| + | </ | ||
| + | |||
| + | The flags register can be pushed onto the stack and popped afterwards. This can be done inside the procedure, but also to test or manipulate bits in the flags register, for which modifications are not supported by a special instruction. | ||
| + | The **pushf** pushes the FLAGS register, the **pushfd** pushes the EFLAGS register, and the **pushfq** pushes the RFLAGS register onto the stack. | ||
| + | The **popf** pops the FLAGS register, the **popfd** pops the EFLAGS register, and the **popfq** pops the RFLAGS register from the stack. | ||
| + | There is also a possibility to copy SF, ZF, AF, PF, and CF to the AH register with the **lahf** instruction, | ||
| + | |||
| + | ===== Segment Register Instructions ===== | ||
| + | Segment register instructions are used to load a far pointer to a pair of registers. One of the pair is the segment, which is determined by the instruction; | ||
| + | The following example shows loading far pointer in 16-bit mode. | ||
| + | <code asm> | ||
| + | ; Load far pointer to DS:BX | ||
| + | ; Variable Far_point holds the 32-bit address | ||
| + | |||
| + | lds BX, | ||
| + | |||
| + | ; Instruction above is equal to: | ||
| + | |||
| + | mov AX,WORD PTR Far_point+2 ; Take higher word of far pointer | ||
| + | mov DS,AX ; Store it in DS | ||
| + | mov BX,WORD PTR Far_point | ||
| + | </ | ||
| + | In 64-bit mode, **lds** and **les** instructions are not supported. | ||
| + | |||
| + | ===== Miscellaneous instructions ===== | ||
| + | ==== No operation ==== | ||
| + | The **nop** instruction performs no operation. The only result is incrementaion of the instruction pointer. In real, it is an alias to the instruction **xchg eax, eax**. | ||
| + | <code asm> | ||
| + | nop ; | ||
| + | xchg eax, eax ; | ||
| + | </ | ||
| + | |||
| + | ==== Load effective address ==== | ||
| + | The **lea** instruction calculates the effective address as the result of the proper address expression and stores the result in a destination operand. We can store the effective address in a single register to avoid complex address calculation inside a loop, like in the following example. | ||
| + | <code asm> | ||
| + | ; Load effective address to BX | ||
| + | ; Table is the beginning of the table in the memory | ||
| + | |||
| + | lea | ||
| + | |||
| + | ; Now we can use BX only to make the program run faster: | ||
| + | hoop: | ||
| + | mov | ||
| + | inc | ||
| + | cmp | ||
| + | jne | ||
| + | </ | ||
| + | < | ||
| + | Because the **lea** instruction adds source arguments, it is sometimes used instead of the **add** instruction. | ||
| + | </ | ||
| + | |||
| + | ==== Undefined instructions ==== | ||
| + | The undefined instructions can be used to test the behaviour of the system software in case of the appearance of an unknown opcode in the instruction stream. The **ud** and **ud1** instructions can have a source operand (register or memory address) and a destination operand (register). Operands are not used. The **ud2** instruction does not have an operand. Executing any undefined instruction results in an invalid opcode exception (#UD) throw. | ||
| + | |||
| + | ==== Table lookup ==== | ||
| + | The **xlatb** instruction copies the byte from a table into the AL register. The byte is addressed as the sum of the BX/EX/RBX and AL registers. There is also an **xlat** version, which enables specifying the address in the memory as the argument. It can be somewhat misleading because the argument is never used by the processor. This instruction can be used to implement the conversion from a 4-digit binary value into a hexadecimal digit, as in the following code. | ||
| + | |||
| + | <code asm> | ||
| + | .DATA | ||
| + | conv_table DB ”0123456789ABCDEF” | ||
| + | |||
| + | .CODE | ||
| + | ; Load base address of table to BX | ||
| + | lea RBX, conv_table | ||
| + | and AL, 0Fh ; Limit AL to 4 bits | ||
| + | xlatb ; Take element from the table | ||
| + | mov char, AL ; Resulting char is in AL | ||
| + | </ | ||
| + | ==== Processor identification ==== | ||
| + | The **cpuid** instruction provides processor identification information. It operates similarly to the function, with the input value sent via an accumulator (EAX). Depending on the EAX value gives different information about the processor. The requested information is returned in processor registers. For example, if EAX is zero, it returns the vendor information string: " | ||
| + | |||
| + | <figure cpuid_vendor> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| + | |||
| + | ==== MOVBE instruction ==== | ||
| + | The **movbe** instruction moves data after swapping data bytes. It operates on words, doublewords or quadwords and is usually used to change the endianness of the data. | ||
| + | |||
| + | ==== Cache manipulating instructions ==== | ||
| + | Cache memory is managed by the processor, and usually, its decisions keep the performance of software execution at a good level. However, the processor offers instructions that allow the programmer to send hints to the cache management mechanism and prefetch data in advance of using it (**prefetchw**, | ||
| + | ===== User Mode Extended State Save/ | ||
| + | Some instructions allow for saving and restoring the state of several units of the processor. They are intended to help processors in fast context switching between processes and to be used instead of saving each register separately at the beginning of a subroutine and restoring it at the end. The content of registers is stored in memory pointed by EDX:EAX registers. Instructions for saving the state are **xsave**, **xsavec**, and **xsaveopt**. Instructions for restoring the state are **xrstor** and **xgetbv**. | ||
| + | |||
| + | ===== Random Number Generator Instructions ===== | ||
| + | In the x64 architecture, | ||
| + | |||
| + | ===== BMI1 and BMI2 Instructions ===== | ||
| + | The abbreviation BMI comes from Bit Manipulation Instructions. These instructions are designed for some specific manipulation of bits in the arguments, enabling programmers to use a single instruction instead of a few. | ||
| + | The **andn** instruction extends the group of logical instructions. It performs a bitwise AND of the first source operand with the inverted second source operand. | ||
| + | There are additional shift and rotate instructions that do not affect flags, which allows for more predictable execution without dependency on flag changes from previous operations. | ||
| + | . These instructions are **rorx** - rotate right, **sarx** - shift arithmetic right, **shlx** - shift logic left, and **shrx** - shift logic right. | ||
| + | Also, unsigned multiplication without affecting flags, **mulx**, was introduced. | ||
| + | Other instructions manipulate bits as the group name stays. | ||
| + | |||
| + | The **lzcnt** instruction counts the number of zeros in an argument starting from the most significant bit. The **tzcnt** counts zeros starting from the least significant bit. For an argument that is not zero, **lzcnt** returns the number of zeros before the first 1 from the left, and **tzcnt** gives the number of zeros before the first 1 from the right. | ||
| + | The **bextr** instruction copies the number of bits from source to destination arguments starting at the chosen position. The third argument specifies the number of bits and the starting bit position. Bits 7:0 of the third operand specify the starting bit position, while bits 15:8 specify the maximum number of bits to extract, as shown in figure {{ref> | ||
| + | |||
| + | <figure bextr_instr> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| + | |||
| + | The **blsi** instruction extracts the single, lowest bit set to one, as shown in figure {{ref> | ||
| + | |||
| + | <figure blsi_instr> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| + | |||
| + | The **blsmsk** instruction sets all lower bits below a first bit set to 1. It is shown in figure {{ref> | ||
| + | |||
| + | <figure blsmsk_instr> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| + | |||
| + | The **blsr** instruction resets (clears the bit to zero value) the lowest set bit. It is shown in figure {{ref> | ||
| + | |||
| + | <figure blsr_instr> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| + | |||
| + | The **bzhi** instruction resets high bits starting from the specified bit position, as shown in figure {{ref> | ||
| + | |||
| + | <figure bzhi_instr> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| + | |||
| + | The **pdep** instruction performs a parallel deposit of bits using a mask. Its behaviour is shown in figure {{ref> | ||
| + | |||
| + | <figure pdep_instr> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| + | |||
| + | The **pext** instruction performs a parallel extraction of bits using a mask. Its behaviour is shown in figure {{ref> | ||
| + | |||
| + | <figure pext_instr> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||