Differences

This shows you the differences between two versions of the page.

--- en:multiasm:papc:chapter_6_7 [2025/10/23 14:52] – [BMI1 and BMI2 Instructions] ktokarz
+++ en:multiasm:papc:chapter_6_7 [2026/02/19 20:52] (current) – [MOV] ktokarz
@@ Line 1: / Line 1: @@
+====== Instruction Set of x86 - Essentials ======
+===== Instruction groups =====
+The x64 processors can execute an extensive number of different instructions. In the documentation of processors, we can find several ways of dividing all instructions into groups. The most general division, according to AMD, defines five groups of instructions:
+  * General Purpose instructions
+  * System instructions
+  * SSE instructions
+  * 64-bit media instructions
+  * x87 Floating-Point instructions
+Intel defines the following groups of instructions.
+  * General Purpose
+  * X87 FPU
+  * X87 FPU and SIMD State Management
+  * MMX Technology
+  * SSE Extensions
+  * SSE2 Extensions
+  * SSE3 Extensions
+  * SSSE3 Extensions
+  * IA-32e mode: 64-bit mode instructions
+  * System Instructions
+  * VMX Instructions
+  * SMX Instructions
+There is also a long list of extensions defined, including SSE4.1, SSE4.2, Intel AVX, AMD 3DNow! and many others. For a detailed description of instruction groups, please refer to
+  * "AMD64 Architecture Programmer’s Manual"((https://docs.amd.com/v/u/en-US/40332-PUB_4.08)),
+  * "Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 1: Basic Architecture"((https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html)).
+Details of every instruction you can find in the description of the instruction set
+  * "AMD64 Architecture Programmer’s Manual Volume 3: General Purpose and System Instructions"((https://docs.amd.com/v/u/en-US/24594_3.37)),
+  * "Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2 (2A, 2B, 2C, & 2D): Instruction Set Reference, A-Z" ((https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html)).
+There are also specialised websites with detailed explanations of instructions that you can use to get a lot of additional information. Among others, you can visit:
+  * X86 Opcode and Instruction Reference ((http://ref.x86asm.net/index.html)) by MazeGen,
+  * x86 and amd64 instruction reference ((https://www.felixcloutier.com/x86/)) by Félix Cloutier.
+In this book, we will present most of the general-purpose instructions and provide general ideas on the chosen extensions, including FPU, MMX, SSE, and AVX.
+===== General Purpose Instructions =====
+General-purpose instructions can be divided into some subgroups.
+  * Data Transfer Instructions
+  * Binary Arithmetic Instructions
+  * Decimal Arithmetic Instructions
+  * Logical Instructions
+  * Shift and Rotate Instructions
+  * Bit and Byte Instructions
+  * Control Transfer Instructions
+  * String Instructions
+  * I/O Instructions
+  * Enter and Leave Instructions
+  * Flag Control (EFLAG) Instructions
+  * Segment Register Instructions
+  * Miscellaneous Instructions
+  * User Mode Extended State Save/Restore Instructions
+  * Random Number Generator Instructions
+  * BMI1 and BMI2 Instructions
+===== Condition Codes =====
+Before describing instructions, let's present the condition codes. The condition code takes the form of a suffix to the instruction and influences its behaviour in such a way that if the condition is met, the instruction is executed; if the condition is not met, the processor moves on to the next instruction in the program. The condition that is checked during the execution of the conditional instruction is based on the current state of the flags in the EFLAGS register. The flags in the EFLAGS register are modified by instructions, mainly arithmetic, logical, shift, or special flag manipulation instructions. It is important to note that flags are not modified when copying data, so to check whether the value just read is zero, you should perform, for example, a comparison.
+Condition codes together with flags checked are presented in table {{ref>table_condition_codes}}.
+<table table_condition_codes>
+<caption>Condition codes</caption>
+^ Condition code //cc//        ^ Flags checked ^ Comment ^
+| E        | ZF = 1           | Equal |
+| Z        | ZF = 1           | Zero |
+| NE       | ZF = 0           | Not equal |
+| NZ       | ZF = 0           | Not zero |
+| A        | CF=0 and ZF=0    | Above |
+| NBE      | CF=0 and ZF=0    | Not below or equal |
+| AE       | CF=0             | Above or equal |
+| NB       | CF=0             | Not below |
+| B        | CF=1             | Below |
+| NAE      | CF=1             | Not above or equal |
+| BE       | CF=1 or ZF=1     | Below or equal |
+| NA       | CF=1 or ZF=1     | Not above |
+| G        | ZF=0 and SF=OF   | Greater |
+| NLE      | ZF=0 and SF=OF   | Not less or equal |
+| GE       | SF=OF            | Greater or equal |
+| NL       | SF=OF            | Not less |
+| L        | SF<>OF           | Less |
+| NGE      | SF<>OF           | Not greater or equal |
+| LE       | ZF=1 or SF<>OF   | Less or equal |
+| NG       | ZF=1 or SF<>OF   | Not greater |
+| C        | CF=1             | Carry |
+| NC       | CF=0             | Not carry |
+| O        | OF=1             | Overflow |
+| NO       | OF=0             | Not ovrflow |
+| S        | SF=1             | Sign (negative) |
+| NS       | SF=0             | Not sign (non-negative) |
+| P        | PF=1             | Parity |
+| PE       | PF=1             | Parity even |
+| NP       | PF=0             | Not parity |
+| PO       | PF=0             | Parity odd |
+</table>
+===== Data transfer instructions =====
+Almost all assembler tutorials start with the presentation of the **mov** instruction, which is used to copy data from the source operand to the destination operand. Our book is not an exception, and we've already shown this instruction in examples presented in previous sections.
+==== MOV ====
+Let's look at some additional variants.
+<code asm>
+mov al, bl         ;copy one byte from bl to al
+mov ax, bx         ;copy word (two bytes) from bx to ax
+mov eax, ebx       ;copy doublweword (four bytes) from ebx to eax
+mov rax, rbx       ;copy quadword (eight bytes) from rbx to eax
+</code>
+In the **mov** instruction, the size of the source argument must be the same as the size of the destination argument. Arguments can be stored in registers, in memory addressed directly or indirectly. One of them can be constant (immediate). Only one memory argument is allowed. This comes from instructions encoding. In instructions, there is only one possible direct or indirect argument to be encoded. That's why most instructions, not only **mov**, can operate with one memory argument only. There are some exceptions, for example, string instructions, but such instructions use specific indirect addressing.
+<code asm>
+mov al, 100        ;0xB0, 0x64
+                   ;copy constant (immediate) of the value 100 (0x64) to al
+mov al, [bx]       ;0x67, 0x8A, 0x07
+                   ;copy byte from the memory at address stored in bx to al
+                   ;(indirect addressing)
+;Notice the difference between two following instructions
+mov eax, 100       ;0xB8, 0x64, 0x00, 0x00, 0x00
+                   ;copy constant 100 to eax
+mov eax, [100]     ;0xA1, 0x64, 0x00, 0x00, 0x00
+                   ;copy value from memory at address 100
+;It is possible to copy a constant to memory addressed directly or indirectly
+;operand size specifier dword ptr is required
+;to inform the processor about the size of the argument
+mov dword ptr ds:[200], 100
+                   ;0xC7, 0x05, 0xC8, 0x00, 0x00, 0x00, 0x64, 0x00, 0x00, 0x00
+                   ;copy value of 100, encoded as dword (four bytes), 0x64 = 100
+                   ;to memory at address 200, encoded as four bytes,  0xC8 = 200
+mov dword ptr [ebx], 100
+                   ;0xC7, 0x03, 0x64, 0x00, 0x00, 0x00
+                   ;copy value of 100, encoded as dword (four bytes), 0x64 = 100
+                   ;to memory addressed by ebx
+</code>
+==== Conditional move ====
+Starting from the P6 machines, the conditional move instruction **cmov//cc//** was introduced. This works similarly to **mov**, but copies data if the specified condition is true. The condition code is one of the codes presented in the section "Condition Codes". If the condition is false, the instruction simply passes through without modifying the arguments. Conditional move instructions can be used to avoid conditional jumps.
+For example, if we need to copy data from ebx to ecx, if the result of the previous operation is negative, we can write the following instruction.
+<code asm>
+cmovs ecx, ebx
+</code>
+==== Sign extension ====
+In the situation of copying data of a smaller size (expressed in number of bits) to a bigger destination argument, the question arises as to what to do with the remaining bits. Let us consider copying an 8-bit value from bl to the 16-bit ax register. If the value copied is unsigned or positive (let it be 5), the remaining bits should be cleared.
+<code asm>
+              ;   ah      al
+mov al, bl    ;        00000101   = 5 in al
+mov ah, 0     ;00000000
+              ;0000000000000101   = 5 in ax
+</code>
+If the value is negative (e.g. -5) the situation changes.
+<code asm>
+              ;   ah      al
+mov al, bl    ;        11111011   = -5 in al
+mov ah, 0     ;00000000
+              ;0000000011111011   = 251 in ax
+</code>
+It is visible that to preserve the original value, the upper bits must be filled with ones, not zeros.
+<code asm>
+              ;   ah      al
+mov al, bl    ;        11111011   = -5 in al
+mov ah, 0xFF  ;11111111
+              ;1111111111111011   = -5 in ax
+</code>
+There are special instructions which perform automatic sign extension, copying the sign bit to all higher bit positions. They can be considered as type conversion instructions. These instructions do not have any arguments as they operate on the accumulator only.
+  * **cbw** - converts byte in al to word in ax
+  * **cwd** - converts word in ax to doubleword in dx:ax
+  * **cwde** - converts word in ax to doubleword extended in eax
+  * **cdq** - converts doubleword in eax to quadword in edx:eax
+  * **cdqe** - convert doubleword in eax to quadword in rax
+  * **cqo** - convert quadword in rax to double quadword in rdx:rax
+Sign extension instructions work solely with the accumulator. Fortunately, there are also more universal instructions which copy and extend data at the same time.
+  * **movsx** - copies and sign-extends a byte to a word or doubleword or word to doubleword.
+  * **movzx** - copies and zero-extends a byte to a word or doubleword or word to doubleword.
+  * **movsxd** - copies and extends a doubleword to quadword in x64 processors.
+==== Exchange instructions ====
+The exchange instructions swap the values of operands. A single exchange instruction can replace three mov instructions while swapping the contents of two arguments, so they can be useful in optimising some algorithms. They are helpful in the implementation of semaphores, even in multiprocessor systems.
+The **xchg** instruction swaps the values of two arguments. If one of the arguments is in memory, the instruction behaves as with the LOCK prefix, allowing for semaphore implementation.
+The **cmpxchg** has three arguments: source, destination and accumulator. It compares the destination argument with the accumulator; if they are equal, the destination argument value is replaced with the value from the source operand. It is used to test and modify semaphores. Its operation is presented in fig {{ref>instr_cmpxchg}}. In newer machines, the eight- and sixteen-byte versions were added: **cmpxchg8b** and **cmpxch16b**. They always use ECX:EBX or RCX:RBX as the source argument and EDX:EAX or RDX:RAX as the accumulator. The destination argument is in the memory.
+<figure instr_cmpxchg>
+{{ :en:multiasm:cs:cmpxchg.png?500 |Illustration of cmpxchg instruction}}
+<caption>Explanation of cmpxchg instruction</caption>
+</figure>
+The **xadd** instruction exchanges two arguments, adds them, and stores the sum in a destination argument. Together with a LOCK prefix, it can be used to implement a DO loop executed by more than one processor simultaneously.
+The **bswap** instruction is a single-argument instruction; it changes the order of bytes in a 32- or 64-bit register. It can be used to convert little-endian data to big-endian representation and vice versa, as shown in figure {{ref>instr_bswap}}.
+<figure instr_bswap>
+{{ :en:multiasm:cs:bswap.png?500 |Illustration of bswap instruction}}
+<caption>Explanation of bswap instruction in 32-bit mode</caption>
+</figure>
+==== Stack instructions ====
+A stack is a special structure in the memory that automatically stores the return address (address of the next instruction) while procedure calling (it is described in detail in the section about the **call** instruction). It is also possible to use the stack for local variables in functions, to pass arguments to procedures, and for temporal data storage. In x86 architecture, the stack is supported by hardware with the special stack pointer register. Instructions operating on the stack automatically modify the stack pointer in a way that it always points to the top of the stack. The **push** instruction decrements the stack pointer and places the data onto the stack. As a result, the stack pointer points to the last data on the stack. It is shown in figure {{ref>instr_push}}.
+<figure instr_push>
+{{ :en:multiasm:cs:push.png?500 |Illustration of push instruction}}
+<caption>Explanation of push instruction</caption>
+</figure>
+The **pop** instruction takes data off the stack, copies it into the destination argument, and increments the stack pointer. After its execution, the stack pointer points to the previous data stored on the stack. It is shown in figure {{ref>instr_pop}}.
+<figure instr_pop>
+{{ :en:multiasm:cs:pop.png?500 |Illustration of pop instruction}}
+<caption>Explanation of pop instruction</caption>
+</figure>
+There are also instructions that push or pop all eight general-purpose registers (including the stack pointer). The 16-bit registers are pushed with **pusha** and popped with **popa** instructions. For 32-bit registers, the **pushad** and **popad** instructions can be used, respectively. The order of registers on the stack is shown in figure {{ref>instr_pushadpopad}}. These instructions are not supported in 64-bit mode.
+<figure instr_pushadpopad>
+{{ :en:multiasm:cs:pushadpopad.png?500 |Illustration of pushad and popad instructions}}
+<caption>Explanation of pushad and popad instructions</caption>
+</figure>
+===== Arithmetic instructions =====
+Arithmetic instructions perform calculations on binary encoded data. It is worth noting that the processor does not distinguish between unsigned and signed values; it is the responsibility of the programming engineer to provide correct input values and properly interpret the results obtained.
+<note>
+There are instructions which support decimal arithmetic, but due to the rare use of BCD numbers in modern software, they are not available in x64 mode.
+</note>
+==== Addition and subtraction ====
+There are two adding instructions. The **add** adds two values from the destination and source arguments and stores the result in the destination argument. It modifies the flags in the EFLAG register according to the result. The **adc** instruction additionally adds "1" if the carry flag (CF) is set. It allows the processor to calculate the sum of the values bigger than can be encoded in a register (for example, 128-bit integers in a 64-bit processor).
+Similarly, there are two subtraction instructions. The **sub** subtracts the source argument from the destination argument, stores the result in the destination, and modifies the flags according to the result. The **sbb** instruction calculates the difference of arguments minus "1" if the CF flag is set (here, CF plays the role of the borrow flag).
+==== Incrementation and decrementation ====
+The **inc** instruction adds "1" to, and **dec** instruction subtracts "1" from the argument. The argument is treated as an unsigned integer.
+==== Multiply ====
+Two multiply instructions are implemented. The **mul** is a one-argument instruction. It multiplies the content of the argument and the accumulator, treated as unsigned numbers. The size of the accumulator corresponds to the size of the argument. The result is stored in the accumulator. As the multiplication can give the result even twice as big as the input values, it is stored in a bigger accumulator size, as shown in the table {{ref>table_mul}}.
+<table table_mul>
+<caption>Multiply instruction argument and result size</caption>
+^ Argument ^ Accumulator ^ Result ^
+| 8 bits        | AL           | AX |
+| 16 bits       | AX           | DX:AX |
+| 32 bits       | EAX          | EDX:EAX |
+| 64 bits       | RAX          | RDX:RAX |
+</table>
+The **imul** instruction implements the signed multiply. It can have one, two or three arguments. The single-argument version behaves the same way as the **mul** instruction. The two-argument version multiplies the 16-, 32-, or 64-bit register as the destination operand by the argument of the same size. The three-argument version multiplies the content of the source argument by the immediate and stores the result in the destination of the same size as the arguments. The destination must be the register.
+==== Divide ====
+Two divide instructions are implemented. The **div** is a one-argument instruction. It divides the content of the accumulator by the argument, treated as unsigned numbers. The size of the accumulator is twice as big as the size of the argument. The result is stored as two integer values of the same size as the argument. The quotient is placed in the lower half of the accumulator, and the remainder in the higher half of the accumulator. Depending on the size of the argument, the accumulator is understood as a pair of registers DX:AX, EDX:EAX or RDX:RAX, as shown in the table {{ref>table_div}}.
+<table table_div>
+<caption>Divide instruction arguments and results size</caption>
+^ Argument ^ Accumulator ^ Quotient ^ Remainder ^
+| 8 bits        | AX           | AL | AH |
+| 16 bits       | DX:AX        | AX | DX |
+| 32 bits       | EDX:EAX      | EAX | EDX |
+| 64 bits       | RDX:RAX      | RAX | RDX |
+</table>
+The **idiv** instruction implements the signed divide. It behaves the same way as the **div** instruction except for the type of numbers.
+===== Logical instructions =====
+The set of logical instructions contains **and**, **or**, **xor** and **not** instructions. All of them perform bitwise Boolean operations corresponding to their names. The **not** is a single-argument instruction; others have two arguments.
+===== Shift and rotate instructions =====
+Shift and rotate instructions treat the argument as the shift register. Each bit of the argument is moved to the neighbour position on the left or right, depending on the shift direction. The number of bit positions for the shift can be specified as a constant or in the CX register. Shift instructions can be used for multiplying (shift left) and dividing (shift right) by a power of two.
+Shift instructions have two versions: logical and arithmetical. Logical shift left **shl** and arithmetical shift left **sal** behave the same, filling the empty bits (at the LSB position) with zeros. Logical shift right **shr** fills the empty bits (at the MSB position) with zeros, while the arithmetical shift right **sar** makes a copy of the most significant bit, preserving the sign of a value. It is shown in figure {{ref>instr_shift}}.
+<figure instr_shift>
+{{ :en:multiasm:cs:shift.png?600 |Illustration of shift arithmetical and logical left and right instructions}}
+<caption>Explanation of shift instructions</caption>
+</figure>
+There are two double shift instructions which move bits from the source argument to the destination argument. The number of bits is specified as the third argument. Shift double right has **shrd** mnemonic, while shift double left has **shld** mnemonic. The operation of shift double instructions is presented in figure {{ref>instr_shiftdouble}}.
+<figure instr_shiftdouble>
+{{ :en:multiasm:cs:shiftd.png?600 |Illustration of double shift instructions}}
+<caption>Explanation of double shift instructions</caption>
+</figure>
+For all shift instructions, the last bit shifted out is placed in the carry flag.
+Rotate instructions shift bits left **rol** or right **ror** in the argument, and additionally move bits around from the lowest to the highest or from the highest to the lowest position. Behaviour of rotate instructions is shown in figure {{ref>instr_rotate}}.
+<figure instr_rotate>
+{{ :en:multiasm:cs:rotate.png?600 |Illustration of rotate instructions}}
+<caption>Explanation of rotate instructions</caption>
+</figure>
+Rotate through carry left **rcl** and right **rcr**, treat the carry flag as the additional bit while rotating. They can be used to collect bits to form multi-bit data. Behaviour of rotate with carry instructions is shown in figure {{ref>instr_rotatec}}.
+<figure instr_rotatec>
+{{ :en:multiasm:cs:rotatec.png?600 |Illustration of rotate with carry instructions}}
+<caption>Explanation of rotate with carry instructions</caption>
+</figure>
+===== Bit and Byte Instructions =====
+Bit test instruction **bt** makes a copy of the selected bit in the carry flag. The bit for testing is specified by a combination of two arguments. The first argument, named the bit base operand, holds the bit. It can be a register or a memory location. The second operand is the bit offset, which specifies the position of the bit operand. It can be a register or an immediate value. It starts counting from 0, so the least significant bit has the position 0. An example of the behaviour of the **bt** instruction is shown in figure {{ref>instr_bt}}.
+<figure instr_bt>
+{{ :en:multiasm:cs:bt14.png?600 |Illustration of bit test instruction}}
+<caption>Explanation of bit test instruction</caption>
+</figure>
+Bit test and modify instructions first make a copy of the selected bit, and next modify the original bit value with the one specified by the instruction. The **bts** sets the bit to one, **btr** clears the bit (resets to zero value), **btc** changes the state of the bit to the opposite (complements).
+The bit scan instructions search for the first occurrence of the bit of the value 1. The bit scan forward **bsf** scans starting from the least significant bit towards higher bits, bit scan reverse **bsr** starts from the most significant bit towards lower bits. Both instructions return the index of the found bit in the destination register. If there is no bit of the value 1, the zero flag is set, and the destination register value is undefined.
+The **test** instruction performs the logical AND function without storing the result. It just modifies flags according to the result of the AND operation.
+The **set//cc//** instruction sets the argument to 1 if the chosen condition is met, or clears the argument if the condition is not met. The condition can be freely chosen from the set of conditions available for other instructions, for example, **cmov//cc//**. This instruction is useful to convert the result of the operation into the Boolean representation.
+The **popcnt** instruction counts the number of bits equal to "1" in a data. The applications af this instruction include genome mining, handwriting recognition, digital health workloads, and fast hamming distance counts((https://patents.google.com/patent/US8214414)).
+The **crc32** instruction implements the calculation of the cyclic redundancy check in hardware. The polynomial of the value 11EDC6F41h is fixed.
+===== Control transfer instructions =====
+Before describing the instructions used for control transfer, we will discuss how the destination address can be calculated. The destination address is the address given to the processor to make a jump to.
+==== Near and far transfer ====
+While the segmentation is enabled, the destination address can be given as the offset only or in full logical form. If there is an offset only, the instruction modifies solely the instruction pointer, the jump is performed within the current segment and is called **near**. If the address is provided in full logical form, containing segment and offset parts, the CS and IP registers are modified. Such an instruction can perform a jump between segments and is called **far**.
+==== Absolute and relative address ====
+An **absolute address** is given as a value specifying the destination address as the number of the byte counted from the beginning of the memory, or, if segmentation is enabled, as the offset from the beginning of the segment. A **relative address** is calculated as the difference between the current value of the instruction pointer and the absolute destination address. It is provided in the instructions as the signed number representing the distance between the current and destination addresses. If it is possible to encode the difference as an 8-bit signed value, the jump is called **short**. Usually, an assembler automatically chooses the shortest possible encoding.
+==== Conditional and unconditional control transfer ====
+Conditional transfer instructions check the state of chosen flags in the Flags register and perform the jump to the specified address if the condition gives a true result. If the condition results in false, the processor goes to the next instruction in the instruction stream. Conditions are specified the same way as in **cmov//cc//** instruction as the suffix to the main mnemonic. Unconditional transfer instructions are always executed the same way. They jump to the specified address without any condition checking.
+==== Unconditional control transfer instructions ====
+Unconditional control transfer instructions perform the jump to the new address to change the program flow.
+The **jmp** instruction jumps to a destination address by putting the destination address in the instruction pointer register. If segmentation is enabled and the destination address is placed in another segment than the current one, it also modifies the CS register.
+The **call** instruction is designed to handle subroutines. It also jumps to a destination address, but before putting the new value into the instruction pointer, it pushes the returning address onto the stack. The returning address is the address of the next instruction after the call. This allows the processor to use the returning address later to get back from the subroutine to the main program.
+The **ret** instruction forms a pair with the **call**. It uses the information stored on the stack to return from a subroutine.
+The process of calling a procedure and returning to the main program is shown in figure {{ref>procedure_call}}.
+<figure procedure_call>
+{{ :en:multiasm:cs:procedure_call.png?500 |Illustration of call and return instructions}}
+<caption>Explanation of call and ret instructions</caption>
+</figure>
+<note>
+In assembler, subroutines are called procedures. In other languages, you can find the names: function (it can return the resulting value), method (in object-oriented languages) or subprogram.
+</note>
+==== Interrupts ====
+An interrupt mechanism in x86 works with hardware-signalled interrupts or with special interrupt instructions. Return from an interrupt is performed by executing the **iret** instruction. In 32 and 64-bit architectures, the mnemonic for this instruction is **iretd**. The **iret** instruction differs from the **ret** instruction with popping of the stack not only the return address but also the content of the Flags register. This keeps the content of this register unmodified after return, and additionally prevents unintentional blocking following interrupts.
+The process of interrupt handler calling and returning to the main program is shown in figure {{ref>interrupt_x86}}.
+<figure interrupt_x86>
+{{ :en:multiasm:cs:interrupt_x86.png?550 |Illustration of interrupt signalling and return from the handler}}
+<caption>Illustration of interrupt signalling and return from the handler</caption>
+</figure>
+Software interrupts are handled the same way as signalled by the hardware. The **int** instruction signals the interrupt of a given number. There are also some special interrupt instructions. The **int1** and **int3** are one-byte special machine codes used for debugging, **into** signals a software overflow exception if the OF flag is set, and **bound** raises the bound range exceeded exception (int 5) when the tested value is over or under the defined bounds. The last two instructions are not valid in 64-bit mode.
+<note>
+In 32 and 64-bit operating systems, the interrupts are handled by the OS and called through the interrupt descriptors, called gates.
+</note>
+==== Conditional control transfer instructions ====
+The **j//cc//** instructions are used to test the state of flags and perform the jump to the destination address if the condition is met. In modern pipelined processors, it is recommended to avoid using conditional jumps if possible, ensuring that the program flows continuously, without the need to invalidate the pipeline. It is important to remember that flags are modified as a result of executing the arithmetic or logic instruction, but not the **mov** instruction. For example, if we need to test if some variable is zero, we can write such code:
+<code asm>
+cmp var1, 0     ;compare variable
+jz is_zero      ;conditional jump to address is_zero
+mov rax, "1"    ;if not zero put ASCII code of "1" in rax
+jmp not_zero    ;jump unconditionally over next instruction
+is_zero:        ;label to jump to if var1 is zero
+mov rax, "0"    ;if zero put ASCII code of "0" in rax
+not_zero:       ;label to jump to if var1 is not zero
+</code>
+<note>
+You can try to optimise this code by avoiding jumps. Try to use the conditional **mov** instruction.
+</note>
+==== Loop instructions ====
+The **loop** instruction is used to implement a loop, which is executed a known number of times. The number of iterations should be set before a loop in the counter register (CX/ECX/RCX). The **loop** instruction automatically decrements the counter register, checks if it reaches zero and if not jumps to the address, which is the argument of the instruction and is assumed as the beginning address of a loop. If the counter reaches zero, the **loop** instruction goes further to the next instruction in a stream.
+There are also conditional versions of the **loop** instruction, which allow finishing the iteration process before the counter reaches zero. The **loope** or **loopz** instructions continue the iteration if the counter is above zero and the zero flag (ZF) is set. The **loopne** or **loopnz** continue iteration if the counter is above zero and the zero flag (ZF) is cleared.
+The **loop** instruction can cause the system to iterate many times if the counter register is zero before entering the loop. As the first step is the decrementing of the counter, it will result in a value composed of all "1". For CX, the loop will be executed 65536 times, for ECX more than 4 billion times and for RCX 184 quintillion 466 quadrillion 744 trillion 73 billion 709 million 551 thousand and 616 times! Understandably, we should avoid such a situation. The **jcxz**, **jecxz** and **jrcxz** instructions can help to jump over the entire loop if the counter register is zero at the beginning, as in the following code.
+<code asm>
+lea rbx, table   ;table with values to sum
+mov rcx, size    ;size of a table - we can't ensure it's not zero
+xor rdx, rdx     ;zero rdx - it will be the sum af elements
+jrcxz end_loop   ;jump over the loop if rcx is zero
+begin_loop:
+add rdx, [rbx]   ;add the item to the resulting value
+inc rbx          ;point to another item in a table
+loop begin_loop  ;loop
+end_loop:
+</code>
+<note>
+According to the information found on the Internet, the **loop** instructions are not optimised for modern pipelined processors, and are often replaced with compare and conditional jump instructions.
+</note>
+===== String Instructions =====
+String instructions are developed to perform operations on elements of data tables, including text strings. These instructions can access two elements in memory - source and destination. If segmentation is enabled, the source operand is identified with SI/ESI and placed always in the data segment (DS), the destination operand is identified with DI/EDI and stored in the extended data segment (ES). In 64-bit mode, the source operand is identified with RSI, and the destination operand is identified with RDI. They can operate on bytes, words, doublewords or quadwords. The size of the element is specified as the suffix of the instruction or derived from the size of the arguments specified in the instruction.
+==== String copy ====
+The **movs** instruction copies the element of the source string to the destination string. It requires two arguments of the size of bytes, words, doublewords or quadwords.
+The **movsb** instruction copies a byte from the source string to the destination string.
+The **movsw** instruction copies a word from the source string to the destination string.
+The **movsd** instruction copies a doubleword from the source string to the destination string.
+The **movsq** instruction copies a quadword from the source string to the destination string.
+<note>
+The locations of the source and destination operands are always accessed with the use of the source and destination index registers, which must be loaded correctly before the string instruction is executed. Arguments, if present, are used to determine the size of the element only.
+</note>
+==== Store string ====
+These instructions store the content of the accumulator to the destination operand.
+The **stos** instruction copies the content of the accumulator to the destination string. It requires one argument of the size of byte, word, doubleword or quadword.
+The **stosb** instruction copies a byte from the AL to the destination string.
+The **stosw** instruction copies a word from the AX to the destination string.
+The **stosd** instruction copies a doubleword from the EAX to the destination string.
+The **stosq** instruction copies a quadword from the RAX to the destination string.
+==== Load string ====
+These instructions load the content of the source string to the accumulator.
+The **lods** instruction copies the content of the source string to the accumulator. It requires one argument of the size of byte, word, doubleword or quadword.
+The **lodsb** instruction copies a byte from the source string to the AL.
+The **lodsw** instruction copies a word from the source string to the AX.
+The **lodsd** instruction copies a doubleword from the source string to the EAX.
+The **lodsq** instruction copies a quadword from the source string to the RAX.
+==== String compare ====
+Strings can be compared, which means that the element of the destination string is compared with the element of the source string. These instructions set the status flags in the flags register according to the result of the comparison. The elements of both strings remain unchanged.
+The **cmps** instruction compares the element of a source string with the element of the destination string. It requires two arguments, which specify the size of the data elements.
+The **cmpsb** instruction compares a byte from the source string with a byte from the destination string.
+The **cmpsw** instruction compares a word from the source string with a word from the destination string.
+The **cmpsd** instruction compares a doubleword from the source string with a doubleword from the destination string.
+The **cmpsq** instruction compares a quadword from the source string with a quadword from the destination string.
+==== String scan ====
+Strings can be scanned, which means that the element of the destination string is compared with the accumulator. These instructions set the status flags in the flags register according to the result of the comparison. The accumulator and string element remain unchanged.
+The **scas** instruction compares the accumulator with the element of the destination string. It requires one argument, which specifies the size of the accumulator and the data element.
+The **scasb** instruction compares the AL with a byte from the destination string.
+The **scasw** instruction compares the AX with a word from the destination string.
+The **scasd** instruction compares the EAX with a doubleword from the destination string.
+The **scasq** instruction compares the RAX with a quadword from the destination string.
+==== Repeated string instructions ====
+All string instructions can be preceded by the repetition prefix to automate the processing of multiple-element tables. Use of the prefix enables the instructions to automatically repeat the instruction execution according to the content of the counter register and modify the source and destination addresses in index registers, accordingly to the size of the element. Index registers can be incremented or decremented depending on the direction flag (DF) state. If DF is "0", the addresses are incremented; if DF is "1" addresses are decremented. While the string element's size is a byte, the addresses are modified by 1. For words, the addresses are modified by 2, for doublewords by 4, and for quadwords by 8.
+The **rep** prefix allows block copying, storing and loading of an entire string rather than a single element.
+The use of repeated string instructions enables copying the entire string from one place in memory to another, or filling up the memory regions with a pattern.
+The **repe** or **repz** prefixes additionally test if the zero flag is "1", to finish prematurely the process of string scan or comparison.
+The **repne** or **repnz** prefixes test if the zero flag is "0" to stop the iteration throughout the string.
+The conditional prefixes are intended to be used with **scas** or **cmps** instructions.
+The use of repeated string instructions with conditional prefixes enables string comparison for equality or differences, or to find the element in a string.
+To properly use the repeated string instructions, follow these steps:
+  - Set the SI/ESI/RSI with the address of the source string.
+  - Set the DI/EDI/RDI with the address of the destination string.
+  - Clear of set the DF to determine the direction of string processing - from lower to higher or from higher to lower addresses, respectively.
+  - Set the counter register CX/ECX/RCX with the number of elements to process
+  - Execute the string instruction with repetition prefix and suffix according to the size of the element.
+===== I/O Instructions =====
+These instructions allow the processor to transfer data between the accumulator register and a peripheral device.
+A peripheral device can be addressed directly or indirectly. Direct addressing uses an 8-bit constant as the peripheral address (named in x86 I/O port), and it accesses only the first 256 port addresses. Indirect addressing uses the DX register as the address register, enabling access to the entire I/O address space of 65536 addresses.
+The **in** instruction reads data from a port to the accumulator. The **out** instruction writes the data from the accumulator to the port. The size of the accumulator determines the size of the data to be transferred. It can be AL, AX or EAX.
+The I/O instructions also have string versions. Instructions to read the port to a string are **ins**, **insb**, **insw**, and **insd**. Instructions to write a string to a port are **outs**, **outsb**, **outsw**, and **outsd**. In all string I/O instructions, the port is addressed with the DX register. Rules for addressing the memory are the same as in string instructions.
+===== Enter and Leave Instructions =====
+Enter instruction creates the stack frame for the function. The stack frame is a place on the stack reserved for the function to store arguments and local variables. Traditionally, we access the stack frame with the use of the RBP register, but we need to preserve its content before use. The **enter** instruction can be nested or non-nested. Not-nested saves the RBP on the stack, copies the stack pointer value to RBP, and adjusts the stack pointer with the constant value, which is the first operand of the instruction. After these steps, the RSP points to the top of the stack frame, and the RBP points to the stack base. The nested version creates the path to the higher-level functions' stack frames by adding their momentary value of RBP. The **leave** instruction reverses what **enter** did at the end of the function. The **enter** should be placed at the very beginning of the function, while the **leave** just before **ret**.
+<note>
+According to the information on compiler behaviour, the **enter** instruction is never used by compilers, while the **leave** instruction is rarely, but sometimes used.
+</note>
+===== Flag Control Instructions =====
+Flag control instructions are typically used to set or clear the chosen flag in the RFLAGS register. We can only control three flags directly. The carry (CF) flag can be used in conjunction with the rotate-with-carry instructions to convert the series of bits into a binary-encoded value. The direction (DF) flag determines the direction of modification of index registers RSI and RDI when executing string instructions. If the DF flag is clear, the index registers are incremented; if the DF flag is set, the registers are decremented after each iteration of a string instruction. The interrupt (IF) flag enables or disables hardware interrupts. If the IF flag is set, the hardware interrupts are enabled; if the IF flag is clear, hardware interrupts are masked.
+The summary of instructions is shown in the table {{ref>table_flags_instructions}}.
+<table table_flags_instructions>
+<caption> Flags manipulating instructions</caption>
+^ Instruction ^ Behavoiur ^ flag affected ^
+| **stc** | set carry flag | CF=1 |
+| **clc** | clear carry flag | CF=0 |
+| **cmc** | complement carry flag | CF=not CF |
+| **std** | set direction flag | DF=1 |
+| **cld** | clear direction flag | DF=0 |
+| **sti** | set interrupt flag | IF=1 |
+| **cli** | clear interrupt flag | IF=0 |
+</table>
+The flags register can be pushed onto the stack and popped afterwards. This can be done inside the procedure, but also to test or manipulate bits in the flags register, for which modifications are not supported by a special instruction.
+The **pushf** pushes the FLAGS register, the **pushfd** pushes the EFLAGS register, and the **pushfq** pushes the RFLAGS register onto the stack.
+The **popf** pops the FLAGS register, the **popfd** pops the EFLAGS register, and the **popfq** pops the RFLAGS register from the stack.
+There is also a possibility to copy SF, ZF, AF, PF, and CF to the AH register with the **lahf** instruction, and store these flags back from AH with the use of the **sahf** instruction.
+===== Segment Register Instructions =====
+Segment register instructions are used to load a far pointer to a pair of registers. One of the pair is the segment, which is determined by the instruction; another is the offset and appears as the destination argument. The source argument is the far pointer stored in the memory. These instructions include **lds** – load far pointer using DS, **les** – load far pointer using ES, **lfs** – load far pointer using FS, **lgs** – load far pointer using GS, and **lss** – load far pointer using SS.
+The following example shows loading far pointer in 16-bit mode.
+<code asm>
+; Load far pointer to DS:BX
+; Variable Far_point holds the 32-bit address
+lds  BX,Far_point
+; Instruction above is equal to:
+mov  AX,WORD PTR Far_point+2 ; Take higher word of far pointer
+mov  DS,AX                   ; Store it in DS
+mov  BX,WORD PTR Far_point   ; Store lower word of far pointer in BX
+</code>
+In 64-bit mode, **lds** and **les** instructions are not supported.
+===== Miscellaneous instructions =====
+==== No operation ====
+The **nop** instruction performs no operation. The only result is incrementaion of the instruction pointer. In real, it is an alias to the instruction **xchg eax, eax**.
+<code asm>
+nop             ;encoded as 0x90
+xchg eax, eax   ;encoded as 0x90
+</code>
+==== Load effective address ====
+The **lea** instruction calculates the effective address as the result of the proper address expression and stores the result in a destination operand. We can store the effective address in a single register to avoid complex address calculation inside a loop, like in the following example.
+<code asm>
+; Load effective address to BX
+; Table is the beginning of the table in the memory
+  lea   BX,Table[SI]
+; Now we can use BX only to make the program run faster:
+hoop:
+  mov   AX,[BX] ; Take value from table
+  inc   BX      ; Next element in the table
+  cmp   AX,0    ; Check if element is 0
+  jne   hoop    ; Jump to „hoop” if AX isn’t 0
+</code>
+<note>
+Because the **lea** instruction adds source arguments, it is sometimes used instead of the **add** instruction.
+</note>
+==== Undefined instructions ====
+The undefined instructions can be used to test the behaviour of the system software in case of the appearance of an unknown opcode in the instruction stream. The **ud** and **ud1** instructions can have a source operand (register or memory address) and a destination operand (register). Operands are not used. The **ud2** instruction does not have an operand. Executing any undefined instruction results in an invalid opcode exception (#UD) throw.
+==== Table lookup ====
+The **xlatb** instruction copies the byte from a table into the AL register. The byte is addressed as the sum of the BX/EX/RBX and AL registers. There is also an **xlat** version, which enables specifying the address in the memory as the argument. It can be somewhat misleading because the argument is never used by the processor. This instruction can be used to implement the conversion from a 4-digit binary value into a hexadecimal digit, as in the following code.
+<code asm>
+.DATA
+conv_table DB ”0123456789ABCDEF”
+.CODE
+; Load base address of table to BX
+  lea   RBX, conv_table
+  and   AL, 0Fh  ; Limit AL to 4 bits
+  xlatb          ; Take element from the table
+  mov   char, AL ; Resulting char is in AL
+</code>
+==== Processor identification ====
+The **cpuid** instruction provides processor identification information. It operates similarly to the function, with the input value sent via an accumulator (EAX). Depending on the EAX value gives different information about the processor. The requested information is returned in processor registers. For example, if EAX is zero, it returns the vendor information string: "GenuineIntel" for Intel processors, "AuthenticAMD" for AMD models in ECX, EDX and EBX registers. It is shown in figure {{ref>cpuid_vendor}}.
+<figure cpuid_vendor>
+{{ :en:multiasm:cs:cpuid_vendor.png?400 |Illustration of vendor string reading by cpuid instruction}}
+<caption>Illustration of  vendor string reading by cpuid instruction</caption>
+</figure>
+==== MOVBE instruction ====
+The **movbe** instruction moves data after swapping data bytes. It operates on words, doublewords or quadwords and is usually used to change the endianness of the data.
+==== Cache manipulating instructions ====
+Cache memory is managed by the processor, and usually, its decisions keep the performance of software execution at a good level. However, the processor offers instructions that allow the programmer to send hints to the cache management mechanism and prefetch data in advance of using it (**prefetchw**, **prefetchwt1**) and to synchronise the cache and memory and flush the cache line to make it available for other data (**clflush**, **clflushopt**). There are also additional instructions implemented for cache management introduced together with multimedia and vector extensions.
+===== User Mode Extended State Save/Restore Instructions =====
+Some instructions allow for saving and restoring the state of several units of the processor. They are intended to help processors in fast context switching between processes and to be used instead of saving each register separately at the beginning of a subroutine and restoring it at the end. The content of registers is stored in memory pointed by EDX:EAX registers. Instructions for saving the state are **xsave**, **xsavec**, and **xsaveopt**. Instructions for restoring the state are **xrstor** and **xgetbv**.
+===== Random Number Generator Instructions =====
+In the x64 architecture, there are two instructions for generating a random number. These are **rdseed** and **rdrand**. A random number is generated by a specially designed hardware unit. The difference between instructions is that **rdseed** gets random bits generated from entropy gathered from a sensor on the chip. It is slower but offers better randomness of the number. The **rdrand** gets bits from a pseudorandom number generator. It is faster, offering output that is sufficiently secure for most cryptographic applications.
+===== BMI1 and BMI2 Instructions =====
+The abbreviation BMI comes from Bit Manipulation Instructions. These instructions are designed for some specific manipulation of bits in the arguments, enabling programmers to use a single instruction instead of a few.
+The **andn** instruction extends the group of logical instructions. It performs a bitwise AND of the first source operand with the inverted second source operand.
+There are additional shift and rotate instructions that do not affect flags, which allows for more predictable execution without dependency on flag changes from previous operations.
+. These instructions are **rorx** - rotate right, **sarx** - shift arithmetic right, **shlx** - shift logic left, and **shrx** - shift logic right.
+Also, unsigned multiplication without affecting flags, **mulx**, was introduced.
+Other instructions manipulate bits as the group name stays.
+The **lzcnt** instruction counts the number of zeros in an argument starting from the most significant bit. The **tzcnt** counts zeros starting from the least significant bit. For an argument that is not zero, **lzcnt** returns the number of zeros before the first 1 from the left, and **tzcnt** gives the number of zeros before the first 1 from the right.
+The **bextr** instruction copies the number of bits from source to destination arguments starting at the chosen position. The third argument specifies the number of bits and the starting bit position. Bits 7:0 of the third operand specify the starting bit position, while bits 15:8 specify the maximum number of bits to extract, as shown in figure {{ref>bextr_instr}}.
+<figure bextr_instr>
+{{ :en:multiasm:cs:bextr.png?400 |Illustration of bit extraction instruction}}
+<caption>Illustration of bit extraction instruction</caption>
+</figure>
+The **blsi** instruction extracts the single, lowest bit set to one, as shown in figure {{ref>blsi_instr}}.
+<figure blsi_instr>
+{{ :en:multiasm:cs:blsi.png?400 |Illustration of the lowest set bit extraction instruction}}
+<caption>Illustration of lowest set bit extraction instruction</caption>
+</figure>
+The **blsmsk** instruction sets all lower bits below a first bit set to 1. It is shown in figure {{ref>blsmsk_instr}}.
+<figure blsmsk_instr>
+{{ :en:multiasm:cs:blsmsk.png?400 |Illustration of the instruction which sets all lower bits below a first bit set to 1.}}
+<caption>Illustration of the instruction which sets all lower bits below a first bit set to 1</caption>
+</figure>
+The **blsr** instruction resets (clears the bit to zero value) the lowest set bit. It is shown in figure {{ref>blsr_instr}}.
+<figure blsr_instr>
+{{ :en:multiasm:cs:blsr.png?400 |Illustration of the instruction which resets a first bit set to 1.}}
+<caption>Illustration of the instruction which resets a first bit set to 1</caption>
+</figure>
+The **bzhi** instruction resets high bits starting from the specified bit position, as shown in figure {{ref>bzhi_instr}}.
+<figure bzhi_instr>
+{{ :en:multiasm:cs:bzhi.png?400 |Illustration of the instruction which resets high bits starting from the specified bit position.}}
+<caption>Illustration of the instruction which resets high bits starting from the specified bit position</caption>
+</figure>
+The **pdep** instruction performs a parallel deposit of bits using a mask. Its behaviour is shown in figure {{ref>pdep_instr}}.
+<figure pdep_instr>
+{{ :en:multiasm:cs:pdep.png?600 |Illustration of the parallel deposit instruction}}
+<caption>Illustration of the parallel deposit instruction</caption>
+</figure>
+The **pext** instruction performs a parallel extraction of bits using a mask. Its behaviour is shown in figure {{ref>pext_instr}}.
+<figure pext_instr>
+{{ :en:multiasm:cs:pext.png?600 |Illustration of the parallel extraction instruction}}
+<caption>Illustration of the parallel extraction instruction</caption>
+</figure>