Programming in Assembler for x64

In this section, we will show some examples of programs written purely in assembler or in connection with other programming languages, including C++ and C#. We assume that the reader is familiar with the coursebook, instructions and directives used to write the assembler programs. We will describe the use of the integrated development environment (Visual Studio) and methods to assemble programs with the command line only.We will also show how to create the static and dynamic library written in assembler for use in assembler or in other compilers.

Introduction to the x64 Assembler programming in MASM - Microsoft Visual Studio Community Edition

In the following chapter, we explain how to write, assemble, link, and execute assembly-language programs for x64 processors. We assume that the reader is familiar with the most important processor instructions and MASM directives.

To write a program in assembler, it is convenient to use a Visual Studio IDE: either commercial or free (Community Edition). The following section presents the installation of Visual Studio Community Edition on Windows.

Visual Studio Community, Professional, and Enterprise are other products than Visual Studio Code. Here, we do not use Visual Studio Code!

Installing VS Community

Installation requires the following simple steps:

Download installer executable from: https://visualstudio.microsoft.com/free-developer-offers/ - Mind to choose Community Edition (purple), not Code (blue)! as in figure 1.
Run the installer, let it download and install.
Configure components - a minimal set of development platforms for our laboratory exercises requires:
1. full install of C++ development platform in native code for Windows (remember to click additional components as in the figure 2),
2. default install of the C# development platform in managed code for Windows.

Figure 1: caption

This scenario concerns the implementation of a command-line Windows x64 application written in pure Assembler. Assembling, debugging, disassembly window, register view, memory view - data section,

[piotr] TO BE DONE

Standalone assembly

It is possible to use command-line MASM tools to assemble, link, and create libraries written in assembly language. You can use any editor to create the assembler source code and translate it into machine code. The tools required are integral elements of the Visual Studio Community installation, installed with the option “Desktop development with C++”. For the default VS installation, you can find them in the following folder (it can change due to different version numbers).

C:\Program Files\Microsoft Visual Studio\18\Community\VS\Tools\MSVC\14.50.35717\bin\Hostx64\x64

To use statically included Windows libraries, you need lib files. The essential library is kernel32.lib, but for other Windows functions, you will also need some additional libraries. All are available in the following folder (it can change due to different version numbers).

C:\Program FIles (x86)\Windows Kits\10\Lib\10.0.26100.0\um\x64

For assembling the source file, the ML64.exe program is used. This program has many options, which you can see executing:

ML64.exe /?

After assembling, ML64 can call the linker automatically. An exemplary MASM execution command to assemble and link the file named source.asm can look like this:

ml64 /Fl /Zi /Zd source.asm /link /entry:main

The options used explanation:

/Fl - generate listing file. MASM will output the source.lst file with the report on the assembling process.
/Zi - add symbolic debug info. MASM will add to the object file names of symbols defined in the program. It will allow debuggers to name user-defined symbols during debugging.
/Zd - add line number debug info. MASM will add to the object file source code line numbers.
/link - MASM will call the linker.
/entry:main - option for the linker, which informs about the entry point of the program.

If you prefer another name than “main” as the entry point for your console program, you will need to specify the type of the system for the resulting code. For a console application, you need to add /SUBSYSTEM:CONSOLE.

The easiest way is to put all required files in the same folder on the disk. This is not the case for more complex projects, so file names should be preceded by their full paths.

It will not be very surprising that the first code example will be the “Hello world!”. This program uses three system functions:

GetStdHandle - returns the handle of the console window, which is the main window of our application.
WriteConsole - displays the text in the console.
ExitProcess - returns control to the operating system.

The functions are implemented in a library file kernel32.lib, which is statically linked. We use the “includelib” directive to inform the linker where to search for functions. To inform the assembler about the names of functions, we declare them with the set of “extern” directives. The details of each statement of the program are explained in comments.

option casemap:none             ; recognising small and capital letters
 
includelib kernel32.lib         ; statically linked library with system functions
 
EXTERN GetStdHandle:PROC        ; declaration of system functions for use
EXTERN WriteConsoleA:PROC
EXTERN ExitProcess:PROC
 
STD_OUTPUT_HANDLE equ -11       ; STD_OUTPUT_HANDLE costant
 
; In the data section of our program, there is a string to be displayed
.data
    message db "Hello, World!", 13, 10
    msgLen  equ $ - message     ; constant calculation containing string length
 
; In the code section of our program, there are instructions for execution
.code
main PROC                       ; main function - entry point
    sub rsp, 28h                ; shadow space + align
 
; HANDLE hConsole = GetStdHandle(STD_OUTPUT_HANDLE)
    mov ecx, STD_OUTPUT_HANDLE
    call GetStdHandle           ; this function returns the handle of the console window
 
; WriteConsoleA(hConsole, message, msgLen, &written, NULL)
    mov rcx, rax                ; console window handle
    lea rdx, message            ; pointer to the buffer
    mov r8d, msgLen             ; length
    lea r9, written             ; pointer to a var with a real number of chars written
    mov qword ptr [rsp+20h], 0  ; 5th argument (lpReserved = NULL)
    call WriteConsoleA          ; this function displays text in the console
 
; ExitProcess(0)
    xor ecx, ecx                ; value to be returned
    call ExitProcess            ; return to operating system
main ENDP                       ; end of the main function
 
; In the uninitialised data section of our program, there is a "written" variable
.data?
    written dq ?                ; variable which holds the number of written chars
 
END                             ; end of source file

Creating static libraries

To create the static library, the assembler module shouldn't have the main procedure defined. All other procedures will be made available for other programs by default. If there is a need to hide a procedure from visibility, it is possible to mark it as PRIVATE. The first step is to assemble the source file with MASM.

ml64 /c source.asm

/c - assemble without linking.

The second step is to create the lib file with the lib tool.

lib source.obj

This will create the source.lib file, which can be imported into the program, where we can use all available procedures.

The example for the library will be the program containing the function “int_to_ascii”, which converts the integer number into a text representation. Let's begin with the function itself. The function accepts two arguments: the number to be converted passed by RCX and the pointer to the buffer for the resulting text passed by RDX. It converts a signed 64-bit number and returns the updated pointer in RDX and the length of the resulting string in RAX. We can use the results in the WinAPI function WriteConsoleA to display the ASCII representation of a number in the console.

Please note that the code of the library module does not have the “main” function, which in an executable program file serves as an entry point.

option casemap:none
 
.code
; ----------------------------------------
; int_to_ascii
; input:   RCX = signed 64-bit number
; output:  updated string at address in RDX
;          RAX = length of the resulting string
; ----------------------------------------
int_to_ascii PROC
    push rbx              ; rbx is nonvolatile
    push rdi              ; rdi is nonvolatile
    sub rsp, 24           ; shadow space
    mov [rsp+8],  rcx
    mov [rsp+16], rdx
    mov rax, rcx          ; mov imput number to rax
 
; point rdi into the buffer end
    mov rdi, rdx          ; pointer to a string
    add rdi, 31
    mov byte ptr [rdi], 0 ; mark string end with terminator
 
    mov rbx, 10
 
; test if the numer is positive or negative
    xor r8d, r8d	  ; r8 = 0 → positive flag
    test rax, rax	  ; test the sign
    jge convert		  ; jump if rax positive
 
    neg rax	          ; change the sign of rax
    mov r8d, 1		  ; r8 = 1 → negative flag
 
; conversion loop
convert:
    dec rdi		  ; starting from the end of the text (least significant digit)
    xor rdx, rdx	  ; prepare to divide rdx:rax by rbx
 
    div rbx		  ; rax / 10 → remainder in rdx
    add dl, "0"		  ; convert remainder into ASCII
    mov [rdi], dl	  ; write character of a digit to buffer
    test rax, rax	  ; test if there is still a value for conversion
    jne convert
 
; add minus if needed
    cmp r8d, 0            ; r8 = 1 → negative flag
    je write
    dec rdi               ; add minus character
    mov byte ptr [rdi], '-'
 
write:
; calculate length of the text (end – rdi)
    mov rax, [rsp+16]     ; get pointer to an original buffer
    add rax, 31
    sub rax, rdi          ; resulting number length in rax
    mov rdx, rdi          ; adjusted pointer to string in a buffer
 
    add rsp, 24           ; restore stack pointer 
    pop rdi
    pop rbx
    ret
int_to_ascii ENDP
 
END

This library can be imported into the assembler program or a program written in another programming language. Assembly program can look as follows:

option casemap:none
 
; include the system library and our convert library
includelib kernel32.lib
includelib convert.lib
 
; declare function we use in our program
EXTERN GetStdHandle:PROC
EXTERN WriteConsoleA:PROC
EXTERN ExitProcess:PROC
EXTERN int_to_ascii:PROC
 
; costant required by GetStdHandle system function
STD_OUTPUT_HANDLE equ -11
 
; data section
.data
    buffer db 32 dup(0) ; buffer for a string
    hOut   dq ?         ; placeholder for console handle
    dummy  dq ?         ; place for dummy parameter
 
;code section
.code
 
; -------------------------------------------
; main function of the program - entry point
; -------------------------------------------
main PROC
; shadow space
    sub rsp, 40
 
; get the handle of stdout
    mov ecx, STD_OUTPUT_HANDLE ; console output
    call GetStdHandle
    mov hOut, rax       ; store the handle
 
; call conversion function
    mov rcx, 33550336   ; number for displaying
    lea rdx, buffer     ; pointer to a buffer
    call int_to_ascii
 
; prepare agruments for WriteConsoleA(hOut, rdi, len, ...)
    mov rcx, hOut       ; console handle 
                        ; pointer to the beginning of a string is in rdx
    mov r8, rax         ; nNumberOfCharsToWrite is in rax  
    lea r9, dummy       ; dummy for lpNumberOfCharsWritten
    mov qword ptr [rsp+20h], 0  ; lpReserved (must be NULL)
 
    call WriteConsoleA  ; displaying function
 
    xor ecx, ecx        ; return value of a program
    call ExitProcess    ; go back to Windows OS
main ENDP
 
END

Introduction to Linux assembly programming

NASM

Scenarios

Displaying integers in hex

In our first scenario, we will modify the conversion library, adding another function which should convert integer input into a hexadecimal representation. We can copy the int_to_ascii function and introduce some simple modifications. First, we need to divide the input value by 16, not by 10.

   mov rbx, 16

After each division operation, we will obtain the remainder from the range 0-15. We can't convert this into an ASCII digit the same way as in decimal, because the digits 0-9 and letters A-F do not form a continuous range. We can deal with this situation in different ways. One approach is to check if dl is bigger than 9 and shift it to point to letter characters if true.

    cmp dl, 9          ; test if dl > 9
    jna zero_to_nine   ; if not jump over adjustment
    add dl, "A"-"9"-1  ; adjust dl with the distance between A and 9
zero_to_nine:
    add dl, "0"        ; convert to ASCII

Another approach is to define the table of characters (lookup table) in the data section containing all digits and letters, and pick the correct character using the xlatb instruction or the mov with proper indirect addressing mode.

.data
hex_digits db "0123456789ABCDEF"
 
.code
...
    lea rcx, hex_digits        ; load address of lookup table
    and rdx, 0000000Fh         ; limit the range to 15
    mov byte ptr dl, [rcx+rdx] ; convert remainder into ASCII
...

In the second approach, we use indirect addressing with the use of the sum of the rcx and rdx registers. The base address of a table must be loaded to rcx with the use of the lea instruction, not used as a constant. This is because an instruction we could use in 32-bit mode:

    mov byte ptr dl, hex_digits[rdx]   ; This instruction is NOT VALID in 64-bit mode

used in 64-bit long mode will signal an error. The address of the lookup table is a 64-bit number, but the constant encoded in the used form of the mov instruction can't exceed 32 bits.

To use the mentioned xlatb instruction, we have to preserve the rax before conversion. We will do it by storing it temporarily in rcx. We need to handle the rbx in a different way. In each iteration, set it to 16 before dividing, and to the lookup table address before xlatb.

.code
...
    mov rbx, 16          ; prepare divisor
    div rbx		 ; rax / 16 → remainder in rdx
    mov rcx, rax         ; store temporarily rax
    lea rbx, hex_digits  ; load address of lookup table
    and rdx, 0000000Fh   ; limit the range to 15
    mov al, dl           ; prepare index in al
    xlatb                ; convert remainder into ASCII
    mov [rdi], al        ; put character to resulting table
    mov rax, rcx         ; restore rax

To improve the performance of our code, in the case of hexadecimal numbers, it is possible to replace the time-consuming division instruction with an instruction to shift the number by four bit positions right. We leave the implementation of this optimisation to the reader.

Displaying floating point values

As the second scenario, we will add to our library a function for displaying floating-point values. This function will allow us to display the results of calculations we implement in further scenarios. According to x64 Windows ABI rules, floating-point values should be passed through XMM registers. We will display a single value, so we'll use the XMM0 register.

Displaying floating-point numbers is a much more complex task than displaying an integer. We will split it into a conversion of the fractional part and a conversion of the integer part. First, we'll store the argument in XMM0 into XMM1 to have the original value unchanged.

Let's start with a check to see if the value is positive or negative. Floating-point numbers are stored as absolute values, with the sign bit in the most significant position. The encoding scheme for positive and negative numbers with the same absolute value differs only in the sign bit. To test whether a number is negative, we can use the movmskps instruction, which copies the sign bits from all elements of a vector into the destination register. As our argument is a scalar, the bit we're interested in is at the lowest position. Shifting the register one position to the right, we can execute a conditional jump. If the argument is negative, we'll change it into positive by clearing a sign bit. The andps instruction with clear_sign_bit variable clears one bit in the XMM1 register.

By default, the memory alignment is set to 16. It means that all variables, even of the size of a byte, are stored in 16 bytes each.

.data
ALIGN 8
clear_sign_bit dword 07FFFFFFFh, 0FFFFFFFFh, 0FFFFFFFFh, 0FFFFFFFFh
ALIGN 16
...
 
.code
...
; test if the number is positive or negative
    movq xmm1, xmm0
    movmskps rax, xmm1
    rcr rax, 1
    jnc float_positive
 
; change the sign of the scalar
    andps xmm1, xmmword ptr clear_sign_bit
 
; do not change the sign  
float_positive:

We will start the conversion from the least significant digit of the fractional part, limiting precision to thousandths. We obtain the fractional part by subtracting the integer part from the original argument. An integer is obtained with the cvttss2si instruction, which simply cuts out the fractional part of a number. We store the result in rcx for further use.

.data 
const1000 real4 1000.0
 
.code
...
; convert fractional part  
    cvttss2si rax, xmm1 ; convert float to int with truncation
    mov rcx, rax        ; store for conversion of an integer part
    cvtsi2ss xmm2, rax  ; convert back into float
    subss xmm1, xmm2    ; subtract integer part
 
    mulss xmm1, const1000 ; we want three fractional digits
    cvttss2si rax, xmm1
 
    mov rbx, 10
convert_fraction:
    dec rdi		; starting from the end of the text (least significant)
    xor rdx, rdx	; prepare to divide rdx:rax by rbx
 
    div rbx		; rax / 10 → remainder in rdx
    add dl, "0"		; convert remainder into ASCII
    mov [rdi], dl	; write character to buffer
    test rax, rax	; test if there is still a value for conversion
    jne convert_fraction

We separate the fractional and integer parts with a dot.

; add dot
    dec rdi
    mov byte ptr [rdi], '.'

The integer part is converted with the same algorithm as the fractional, but before we restore its value from rcx.

; convert integer part
    mov rax, rcx        ; restore integer part
 
convert_integer:
    dec rdi		; starting from the end of the text (least significant)
    xor rdx, rdx	; prepare to divide rdx:rax by rbx
 
    div rbx	        ; rax / 10 → remainder in rdx
    add dl, "0"		; convert remainder into ASCII
    mov [rdi], dl	; write character to buffer
    test rax, rax	; test if there is still a value for conversion
    jne convert_integer

After converting the integer part, we need to add “minus” for a negative value. We'll test it again with the same method as at the beginning. For this purpose, we kept the original argument in XMM0.

; test if the number is positive or negative
    movmskps rax, xmm0
    rcr rax, 1
    jnc end_float
 
; add minus if needed
    dec rdi             ; add minus character
    mov byte ptr [rdi], '-'
 
end_float:

The final part, calculating the string length, is the same as in the conversion of integers.

Implementation of calculation functions

In another scenario, we will create another library with functions performing the simple calculations on integers and floating-point numbers.

Table of Contents