This content was implemented under the following project:
Consortium Partners
Erasmus+ Disclaimer
This project has been co-funded by the European Union.
Views and opinions expressed are, however, those of the author or authors only and do not necessarily reflect those of the European Union or the Foundation for the Development of the Education System. Neither the European Union nor the entity providing the grant can be held responsible for them.
Copyright Notice
This content was created by the MultiASM Consortium 2023–2026.
The content is copyrighted and distributed under CC BY-NC Creative Commons Licence and is free for non-commercial use.
In case of commercial use, please get in touch with MultiASM Consortium representative.
This manual is intended to help students bootstrap into assembler programming across a variety of applications. It presents practical exercises in a hands-on lab format, often also covering toolchain configuration. Some sections present details for hardware, such as remote IoT and remote ARM laboratories. Others assume the student owns or has access to the PC and can install software.
ARM processors are omnipresent, ranging from simple IoT devices to laptops, notebooks, and workstations.
For this reason, we had to select one technology to use for a practical introduction and experimentation.
To present both hardware interfacing and programming, the obvious choice is the Raspberry Pi. The following chapters present laboratory details and scenarios.
Follow the links below to the lab descriptions and scenarios:
Assembler programming for embedded systems uses both on-site programming of devices connected directly to the development platform (usually via USB) and an integrated solution for IoT laboratories: VREL NextGen Software for remote experimentation.
Local development requires installing the development toolchain. A common scenario is to use Visual Studio Code, a compiler and, usually, a plugin dedicated to a selected platform, e.g. AVR Assembler Toolbox.
Remote development uses a ready-made development platform accessible only via a web browser. The device is observable only via a live video stream, which introduces limitations to consider, such as latency and the lack of physical access to the device (e.g., pushing a reset button is impossible).
Users connect to the system using a web browser and develop software in the browser, compile it and inject it into the microcontroller, all remotely.
The following chapters present additional information on using the VREL NextGen remote labs system for assembler programming.
VREL NextGen software is a web-based, integrated solution for both IoT software developers (Users/Students) and system administrators (Administrators, Super Administrators). It can be used in one of the three aforementioned roles.
There are many public and private instances for internal purposes of the Consortium HE and SME Members. Using the system requires registration with a valid email address. A front-page view is present in figure 2.
In the following chapters, there is a manual on how to use the system:
Students book a device (or multiple devices) exclusively. Each device has specific hardware and programming features, which are provided in the documentation.
There is usually a time limit for device bookings, e.g., 2 hours per booking.
Students author the code in the web-based editor; depending on the platform, this may also require authoring some configuration files (e.g., platformio.ini, makefile, etc.). Refer to the technical specification for the particular laboratory nodes - it is highly contextual.
Once the code is ready, it can be compiled, and if the compilation is successful, it can be uploaded to the device.
Results can be observed via the web camera in near real time. Some nodes will also provide other interaction capabilities, usually via a bottom-right part of the screen, where documentation is integrated.
Several instances of this software are implemented across consortium partners (details are on IOT-OPEN.EU and IOT-OPEN.EU Reloaded Main Page, but perhaps the one you may want to start from is an instance implemented in SUT, shared with TalTech, ITsilesia, and ITT Group: SUT's VREL NextGen.
Student needs to create an account, virtually as in any other web application (figure 3):
Once the account is created, check your mailbox for an activation link. Activate your account and log in to the system.
Devices are booked exclusively.
Other devices are provided solely for consortium members.
The device booking process is straightforward. You can book now, and in the future. The process is described below:
The following chapter assumes that you are familiar with basic assembler operations for AVR microcontrollers. Below, we explain the most important construction elements and assembler instructions for manipulating the Arduino Uno's (figure 8) GPIOs, based on the ATmega328P microcontroller.
Using plain assembler (not C++ + assembler) requires a specific construction of the application where the program is located (loaded) into memory exactly at 0x0000.
.org 0x0000 rjmp start start: ...
It is common practice to use rjmp (relative jump), which makes is easier to place data before the start of the code. And it is a good “embedded” practice to keep it even, if it does not really jump, as in this example. Forgetting to include it may affect your programming experience later, when you decide to declare data, use interrupts, and so on.
Location of the code (Flash) and data (SRAM) is assigned to the addressing space. It also impacts source code construction in assembler. The following image presents the ATMega328P memory map. When using fully manual memory control, e.g., when the source code does not use .section, it is necessary to explicitly tell the compiler where to place the source code, variables, and other memory-related components. Details are presented in figure 9 and discussed in the following subsection.
Source code needs to use explicit declarations to tell the GCC-AVR toolkit how to handle the contents: whether it is code or data, whether the variable is read-only, whether it should be stored across updates, and so on. There are two possible approaches to is: one is to use declarations ('.section'), the other is to manually handle addresses.
There are five .section declarations, as presented in table 1.
Each of them does the job of .org <address> in a more elegant way: e.g. .section .data is equivalent to .org 0x800100 - one does not need to remember the addresses.
| Section | Content | Location | Volatile? |
|---|---|---|---|
| .text | Instructions / Code | Flash | No |
| .data | Initialized globals | SRAM (from Flash) | Yes |
| .bss | Zeroed/Uninitialized globals | SRAM | Yes |
| .rodata | Constants / Strings | Flash | No |
| .eeprom | Long-term storage | EEPROM | No |
We mentioned before that .section .data is equivalent to .org 0x800100. Why .org 0x800100 instead just .org 0x0100?
It is, because Flash, EEPROM, and SRAM all start at 0x0000 (see figure 9, and you need to tell the linker (via source code) which memory block you're referring to. Writing .org 0x0100 may be misleading - the compiler will assume it is located in flash instead of SRAM.
For this reason, the way the AVR-GCC toolchain (assembler and linker) handles Harvard Architecture in Arduino Uno (ATMega328P) is the use of virtual memory offsets: 0x000000 means it is Flash, 0x800100 means it is SRAM (built-in) and 0x810000 means it is EEPROM. Details are presented in table 2.
| Memory Type | GCC Internal Offset | Hardware Address |
|---|---|---|
| Flash | 0x000000 | 0x0000 |
| SRAM | 0x800000 | 0x0000 |
| SRAM (Internal) | 0x800100 | 0x0100 |
| EEPROM | 0x810000 | 0x0000 |
.org 0x0100 for simplicity. It is only when the code contains no variables, and everything is stored in Flash.
To summarise briefly, the most common scenario is that the code is intended to land in Flash memory, while variables are in SRAM. Appropriate '.org' instructions ensure the correct placement of the following content.
It is possible to write code without using sections, but that makes the code unnecessarily complicated. Whenever you use variable declarations, it is advised to use sections to make the code cleaner and easier to understand. If your code is as simple as setting a GPIO out and one does not use variables (everything is in flash), then you may abandon .section declarations.
The sample code below declares a 16-bit value named 'analogue_value' stored in SRAM (RAM). Note use of .section:
.section .data enforces that the following declarations fit in RAM (SRAM),.section .text ensures that instructions following the declaration are located in Flash (non-volatile) memory..section .data .org 0x100 ; Set SRAM start address manually analogue_value: .skip 2 ; 16-bit variable .section .text .org 0x0000 rjmp main main: ; sample values to store ldi r24, 0xFF ldi r25, 0x03 ; store it to SRAM sts analogue_value, r24 sts analogue_value + 1, r25 loop: rjmp loop ; Dummy loop
.data section in C++ code, the C++ on boot program loads copies all pre-initialised variables from flash to memory. But here, we use pure assembler, and this process is not triggered; the result is that your string variable existing in the SRAM will contain garbage, not the actual string that you're declaring, even if everything looks OK on the source code level. Because of it, and mostly because of the very limited RAM, keep strings and pre-initialised variables in flash and assume ALL variables declared in .data as uninitialised.
The Arduino Uno exposes a number of GPIOs that can serve as binary inputs and outputs, analogue inputs, and many of them provide advanced, hardware-accelerated functions, such as UART, SPI, I2C, PWM, and ADC. In fact, not all of the pins on the development board are such “general-purpose”: some of them provide specific features, while others do not: there is no internal multiplexer, so functions such as UART, I2C, SPI, PWM and ADC are bound to particular GPIOs and cannot be changed.
On the hardware level, GPIO pins are grouped into 3 “ports” (figure 10), and it is how you can access them:
A bit in the port corresponds to a single GPIO pin, e.g. bit 5 (6th, zero-ordered) of PortB corresponds to GPIO D13 and is also connected to the built-in LED.
IO Registers
Each Port has assigned three 8-bit registers (there are 9 in total then):
Core I/O registers and their IDs
To operate on I/O registers, the developer must either include a library with definitions or (when programming in pure assembler) declare them on their own.
Below there is a table 3 with a list of I/O registers used to control GPIO (Ports B, C and D) and their addresses:
| Name | Address (I/O) | Description |
|---|---|---|
| PINB | 0x03 | Input pins register (Port B) |
| DDRB | 0x04 | Data direction register (Port B) |
| PORTB | 0x05 | Output register/pull-up enable (Port B) |
| PINC | 0x06 | Input pins register (Port C) |
| DDRC | 0x07 | Data direction register (Port C) |
| PORTC | 0x08 | Output register/pull-up enable (Port C) |
| PIND | 0x09 | Input pins register (Port D) |
| DDRD | 0x0A | Data direction register (Port D) |
| PORTD | 0x0B | Output register/pull-up enable (Port D) |
The easiest is to declare constants (converted to values at compile time) and insert them before the code starts (note that they do not exist in memory, so do not disturb code placement and proper execution):
; I/O registers .equ PINB, 0x03 .equ DDRB, 0x04 .equ PORTB, 0x05 .equ PINC, 0x06 .equ DDRC, 0x07 .equ PORTC, 0x08 .equ PIND, 0x09 .equ DDRD, 0x0A .equ PORTD, 0x0B ; your code starts here .org 0x0000 rjmp start start: ...
.equ is converted into a value and substituted in the code during compile: thus it does not exist in the final, compiled binary code.
.equ PINB, 0x03 or .equ PINB = 0x03
Below are sections representing common usage scenarios for GPIO control.
There is a set of assembler instructions that operate on Ports (I/O registers), as shown in table 4. Those instructions help to control each GPIO pin. They are handy for manually setting GPIO outputs to HIGH (1) or resetting them to LOW (0) each individually, or in a group (a single Port), all at once. They also help to check input values when GPIOs are configured as inputs. Some applications, however, use hardware acceleration beyond manual switching on and off: for example, a PWM signal can be generated using separate hardware-based mechanisms, as described further, which are far more precise than manually enabling and disabling a bit in a loop and do not load the CPU.
DigitalRead, DigitalWrite, and other instructions in C++, roughly 50 times faster.
| Instruction | Description |
|---|---|
SBI | Set bit in register |
CBI | Clear bit in register |
SBIS | Skif if bit in register is set (1) |
SBIC | Skip if bit in register is clear (0) |
IN | Read hardware register to the general-purpose register (R0-R31) |
OUT | Write the general-purpose register to the hardware register. |
ANDI | Masks a bit |
ORI | Sets a bit |
A common scenario for manual control of the GPIO pin is to first set either the GPIO is input or output (using the correct DDRx register), then either set (SBI), reset (CBI), check (SBIS, SBIC), read the whole register (IN) or write the whole register (OUT).
IN and OUT instructions operate on whole, 8-bit registers rather than on single bits. Those are general-purpose instructions that cover the entire range of IO registers (0-63), in addition to the aforementioned DDRx, PORTx, and PINx registers. Operating on multiple bits (8 bits) is faster than setting or reading them individually.
Below are common scenarios implemented in assembler that will help you to understand the code and start programming.
In this scenario, we use GPIO as a digital output. The simplest is to use the built-in LED to get instantly observable results.
The built-in LED is connected to GPIO13 (D13) and is controlled via PortB (5th bit, zero-based indexing; see figure 10). The built-in LED is enabled in the LOW (0) state and off in the HIGH (1) state on GPIO13.
It is also convenient to declare a bit number representing the built-in LED position in PortB, so instead of using a number, we can use an identifier, such as .equ PB5,5.
This code flashes the built-in LED.
.equ DDRB, 0x04 .equ PORTB, 0x05 .equ PB5, 5 ; PB5 is GPIO 13, and it is a built-in LED .org 0x0000 rjmp RESET
Step 1 - configure GPIO13 (PortB, bit 5) as output, using DDRB register:
RESET: ldi r16, 1 << PB5 ; Set bit 5 out DDRB, r16 ; Set PB5 as output
Execute in a loop on and off, setting directly PortB's bit 5 with sbi and cbi.
LOOP: sbi PORTB, PB5 ; Turn LED off rcall delay cbi PORTB, PB5 ; Turn LED on rcall delay rjmp LOOP
This implementation of the delay is based on calculating the CPU cycles used to execute the following algorithm:
delay: ldi r20, 43 ; Outer loop outer_loop: ldi r18, 250 ; Mid loop mid_loop: ldi r19, 250 ; Inner loop inner_loop: dec r19 brne inner_loop dec r18 brne mid_loop dec r20 brne outer_loop ret
Instructions used in those loops are listed in the table 5, along with a number of cycles used:
| Instruction | Cycles |
|---|---|
| ldi | 1 |
| dec | 1 |
| brne | 2 (taken), 1 (not taken) |
| ret | 4 |
Inner loop runs exactly 250 times. Thus, the exact number of cycles used is calculated as:
ldi r19,250),dec r19),
Total for this inner loop is then 750 clock cycles of the ATMEGA 328p MCU.
Mid-loop runs also 250 times. Each of 250 mid-loop passes uses:
ldi r19,250 for mid loop init)dec r18)
Thus, at the level of mid-loop, the total cost of the algorithm consumes: 188250 cycles
The outer loop runs 43 times. It calls mid-loop 43 times, and the exact number of cycles used is:
lid r20,43 initialise the outer loop),dec r20 is 1 cycle),
The final cost of the loops is 8094879 cycles.
An extra 4 cycles is for the final ret.
Thus, the total cost of the delay section is 8 094 883 clock cycles.
ATMEGA 328p runs at 16 MHz; thus, each cycle takes 1/16000000 of a second.
Overall, the algorithm's execution time is 8094883/16000000, which is about 0.5s (506ms, to be clear). Not perfect, but good enough for this approach. Still, implementing delays this way is straightforward but also troublesome, and there are better solutions, such as using timers.
delay function works, but it is troublesome. First of all, it is a blocking 'delay'; second, it is energy-inefficient; and, most of all, it is troublesome: you need to analyse your algorithm instruction-by-instruction and calculate the total number of ticks.
GPIOs may be used as inputs, e.g., to check the state of a button. This method is called active polling; we test the button's state in the loop. A common scenario is that a button shorts to GND, which requires a pull-up resistor (either external or internal). For the internal pull-up, it is necessary to explicitly enable it in assembler code. Arduino Uno is also able to use interrupts for this purpose, and we describe it in the following section.
Configuring GPIO as input with pull-up is pretty simple:
Reading the value of a GPIO is as simple as reading the corresponding bit in the PINx register: when the GPIO is HIGH, the bit is 1; when the GPIO is LOW, the bit is 0. When the GPIO value controls the algorithm flow, it is more convenient (and faster and more memory-efficient) to use conditional jumps based on the PINx bit value, such as the SBIC instruction.
The example below shows an Arduino Uno with a button connected to GPIO 2, controlling the built-in LED connected to GPIO 13. On button press, the LED turns on; on release, it turns off. Contextual circuit schematic is presented in figure 11.
Declare ports: output (LED) is on PortB, bits 5 (GPIO 13), and input (button) is on PortD (GPIO 2) as presented in figure 10.
; output (built-in LED on GPIO13) .equ PINB, 0x03 ; Input Pins Address Port B .equ DDRB, 0x04 ; Data Direction Register Port B .equ PORTB, 0x05 ; Data Register Port B .equ PB5, 5 ; Pin 13 is Port B, Bit 5 ; input (button connected to GPIO 2) .equ PIND, 0x09 ; Input Pins Address Port B .equ DDRD, 0x0A ; Data Direction Register Port B .equ PORTD, 0x0B ; Data Register Port B .equ PD2, 2 ; Pin 2 is Port D, Bit 2 .section .text .org 0x0000 rjmp main
Configure GPIO 13 as an output, GPIO 2 as an input, and enable the internal pull-up resistor on GPIO 2.
main: sbi DDRB, PB5 ; Set PB5 (GPIO 13) as Output cbi DDRD, PD2 ; Set PB0 (GPIO 2) as Input sbi PORTD, PD2 ; Enable Internal Pull-up on PB0 (GPIO 2)
This section is a simple push-switch implementation. Instead of reading PD2 (GPIO 2), we use the sbic instruction, which tests a bit and branches based on its value: it bypasses the next instruction if PD2=0, thereby executing the section starting with the led_on label. Note that PD2=0 means that the button is pressed, not released.
loop: sbic PIND, PD2 ; Skip next instruction if PD2 (GPIO 2) is LOW (Button Pressed) rjmp led_off ; If High (Not Pressed), go to led_off led_on: sbi PORTB, PB5 ; Set Pin 13 High rjmp loop ; Jump back to start of loop led_off: cbi PORTB, PB5 ; Set Pin 13 Low rjmp loop ; Jump back to start of loop
Naturally, polling is simple, but inefficient in terms of resource utilisation. It is far better to implement interrupt-based monitoring of the GPIO input that we present in the following section.
The ATMega328P MCU has two options for handling GPIO changes using interrupts:
The GPIO-related interrupt system is controlled with a number of registers, presented in table 6.
| Register | Name | Description |
|---|---|---|
| EICRA | External Interrupt Control Register A | Configures the trigger condition (rising edge, falling edge, any logical change, or low level) for the dedicated hardware interrupts INT0 and INT1. |
| EIMSK | External Interrupt Mask Register | Used to explicitly enable or disable the INT0 and INT1 interrupts. |
| EIFR | External Interrupt Flag Register | Holds the hardware flags that indicate an INT0 or INT1 event has occurred. (Cleared automatically when the ISR runs, or manually by writing a logic '1' to the bit). |
| PCICR | Pin Change Interrupt Control Register | Enables pin change interrupts for entire banks/ports (Bank 0 for Port B, Bank 1 for Port C, Bank 2 for Port D). |
| PCMSK0 | Pin Change Mask Register 0 | Selects which individual pins on Port B (PCINT0 to PCINT7) are allowed to trigger a Pin Change Interrupt. |
| PCMSK1 | Pin Change Mask Register 1 | Selects which individual pins on Port C (PCINT8 to PCINT14) are allowed to trigger a Pin Change Interrupt. |
| PCMSK2 | Pin Change Mask Register 2 | Selects which individual pins on Port D (PCINT16 to PCINT23) are allowed to trigger a Pin Change Interrupt. |
| PCIFR | Pin Change Interrupt Flag Register | Holds the hardware flags indicating a pin change has occurred on Bank 0, 1, or 2. |
| SREG | Status Register | Bit 7 is the Global Interrupt Enable bit (I-bit). This is the master switch for all interrupts, controlled by the sei and cli assembly instructions. |
Below is an example code that handles a button press using the INT0 interrupt - we use dedicated interrupts and predefined pins to “save” on complex logic. A press toggles the LED. Note that, for simplicity, this code does not implement any debouncing mechanism: if you test it in a real scenario, it may occur that multiple interrupts are triggered during a single press, because of bouncing. We also use a button connected to GPIO 2. The corresponding schematic is presented in figure 11.
GPIO 2 (button) is controlled with PortD (bit 2). GPIO 13 (LED) is controlled with PortB (bit 5).
.equ PINB, 0x03 ; Port B Input Pins Address .equ DDRB, 0x04 ; Port B Data Direction Register .equ PORTD, 0x0B ; Port D Data Register (used for pull-ups) .equ EIMSK, 0x1D ; External Interrupt Mask Register .equ EICRA, 0x69 ; External Interrupt Control Register A .equ PD2, 2 ; GPIO 2 (input) .equ PB5, 5 ; GPIO 13 (output, built-in LED) ; --- Bit Definitions for EICRA --- .equ ISC00, 0 ; Interrupt Sense Control 0 Bit 0 .equ ISC01, 1 ; Interrupt Sense Control 0 Bit 1
Interrupts on the ATMega328 have fixed assignments in the so-called interrupt vectors table - a relation between a reason (an interrupt) and a result (instruction, usually a jump to the handling function), which is located at the beginning of the code. To know more about interrupts, one should refer to the ATMega328P documentation ATMega328P Datasheet - it represents a list of all interrupts and the location of the addresses of handlers (interrupt vectors) in the memory. We're using an interrupt triggered when a preconfigured change occurs on GPIO2 - it is INT0. When this interrupt is triggered, the ATMega328P will look at the interrupt vectors and execute the instruction at address 0x0004. Typically, there is a jump (rjmp) instruction to the function that handles this interrupt (interrupt handler, or, shortly, ISR; here, its name is int0_isr).
; --- Interrupt Vector Table --- ; Note: avr-as uses byte addresses for .org. ; The ATmega328P word address for INT0 is 0x0002, which is byte address 0x0004. .section .text .org 0x0000 rjmp reset ; Reset vector .org 0x0004 rjmp int0_isr ; INT0 vector (triggered by D2) .org 0x0034 ; Bypass interrupts vector table
Main program configures GPIOs (GPIO 13 as output, GPIO 2 as input with internal pull-up) and configures INT0 to trigger on the falling edge: when the button is pressed, it shorts to LOW (GND), causing a 1→0 change on press, and the opposite on release. Then it enables INT0 (actually, it does not enable interrupts as a whole) and executes the sti instruction, which enables interrupts. Finally, the loop does nothing; interrupts are handled asynchronously.
; --- Main Program --- reset: ; Configure GPIOs sbi DDRB, PB5 ; LED on PB5 (GPIO 13) as an output cbi DDRD, PD2 ; Button on PB0 (GPIO 2) as Input sbi PORTD, PD2 ; Enable internal pull-up for the button ; Configure INT0 to trigger on a Falling Edge ; ISC01 = 1, ISC00 = 0 (Value: 0x02). ; EICRA is in extended memory space, so we must use ldi/sts, not out. ldi r16, (1 << ISC01) sts EICRA, r16 ; INT0 ; INT0 is bit 0 in EIMSK. This register is in standard I/O space. sbi EIMSK, 0 ; Enable Global Interrupts sei loop: ; Do nothing, let the hardware interrupt handle everything rjmp loop
The function is called when interrupt INT0 (on the falling edge of GPIO 2) occurs, and it simply toggles PB5 (GPIO 13, built-in LED). This is a trick in AVR that simplifies code: the classical read→swap→write is replaced by a single sbi instruction call, which, in the context of the GPIO registers, toggles the selected bit.
; --- Interrupt Service Routine --- int0_isr: ; Toggle the LED on PB5. ; On the ATmega328P, writing a logic 1 to a PINx register toggles the corresponding PORTx bit. sbi PINB, PB5 reti
The Arduino Uno has no direct debugging capabilities, such as step-by-step execution. To monitor program execution, tracing can be used. Here, there is no rich user interface, such as a display, however. One of the tracing methods is sending information via the serial port. It can then be visualised on a developer's computer using any serial port monitor tool.
UART uses two pins:
While it is possible to implement a full serial port protocol using GPIOs alone (so-called soft-serial), here we will use a hardware-implemented UART with several registers, as shown in the table 7.
| Register | Address | Official Name | Common Name | Bits | Description |
|---|---|---|---|---|---|
| UDR0 | 0xC6 | USART I/O Data Register | Data register / TX-RX buffer | 7:0 | Write to transmit data, read to receive data |
| UCSR0A | 0xC0 | USART Control and Status Register A | Status register | RXC0, TXC0, UDRE0, FE0, DOR0, UPE0, U2X0, MPCM0 | Status flags (ready, complete, errors, speed mode) |
| UCSR0B | 0xC1 | USART Control and Status Register B | Control register | RXCIE0, TXCIE0, UDRIE0, RXEN0, TXEN0, UCSZ02, RXB80, TXB80 | Enable TX/RX, interrupts, 9-bit mode |
| UCSR0C | 0xC2 | USART Control and Status Register C | Configuration / Frame register | UMSEL01:0, UPM01:0, USBS0, UCSZ01:0, UCPOL0 | Frame format (mode, parity, stop bits, data size) |
| UBRR0L | 0xC4 | USART Baud Rate Register Low | Baud rate register (low) | 7:0 | Lower byte of baud rate divider |
| UBRR0H | 0xC5 | USART Baud Rate Register High | Baud rate register (high) | 3:0 | Upper byte of baud rate divider |
In the example below, we will use TX only to send data from the MCU to the developer's PC. Let's start with some declarations for registers used during serial transmission and flags:
.equ UBRR0H, 0xC5 .equ UBRR0L, 0xC4 .equ UCSR0A, 0xC0 .equ UCSR0B, 0xC1 .equ UCSR0C, 0xC2 .equ UDR0, 0xC6 .equ TXEN0, 3 ; bit 3 controls if UART is enabled or disabled .equ UDRE0, 5 ; bit 5 indicates the transmit buffer is empty
Then let's define a message “Hello World”. Tailing bytes 13 and 10 are the Windows-standard end-of-line sequence, and the string is 0-terminated.
.org 0x0000 rjmp reset message: .byte 'H','e','l','l','o',' ','W','o','r','l','d',13,10,0
The following section initialises the serial port for 9600bps:
ldi r16, hi8(103) sts UBRR0H, r16 ldi r16, lo8(103) sts UBRR0L, r16
The 103 value is loaded into the UBRR register: the high byte into UBRR0H and the low byte into UBRR0L. The prescaler can be calculated using the formula shown in figure 12.
Where Fcpu is 16MHz for regular Arduino Uno (AtMega 328P). Note that this calculation yields ~9615 bps, not exactly 9600 bps. A tolerance of up to 2% is acceptable (here, it is 0.16%).
Next step is to enable UART:
ldi r16, (1 << TXEN0) sts UCSR0B, r16
and configure frame format (8 bits, no parity, 1 stop bit, shortly 8N1 - the most common case):
ldi r16, (1 << TXEN0) sts UCSR0B, r16
Now it is time to send the string to the transmitter, byte by byte. Pointer to the string is loaded to Z register (ZH i ZL respectively) using ldi. The string is processed character by character until it encounters 0 (the end of the string).
main: ldi ZH, hi8(message) ; Load high byte of message address into ZH (Z pointer → flash) ldi ZL, lo8(message) ; Load low byte of message address into ZL send_loop: lpm r18, Z+ ; Load next byte from program memory (message) into r18, then increment pointer cpi r18, 0 ; Check end of string breq main ; If the end of the string is reached, start sending the whole "Hello World" again
The next character can be loaded to the sending buffer only if the previous one is already been sent. The transmitter is ready for the next byte only when bit UDRE0 in register UCSR0A is set (1). If not, one needs to wait until it is transmitted. The next byte (character, letter) can be written to UDR0 then:
wait_udre: lds r19, UCSR0A ; Load Serial port status register into r19 sbrs r19, UDRE0 ; Check if buffer is ready to accept next byte rjmp wait_udre ; If not ready, keep waiting sts UDR0, r18 ; Write character from r18 to UART data register (start transmission) rjmp send_loop ; And process next character
Timers are handy for measuring time, waiting for a delay, or executing delayed tasks either once or periodically. The last one is very helpful for generating a PWM signal (a square wave with a controllable duty cycle) and thus controlling the amount of energy delivered to the externally connected device via the GPIO, e.g., to control an LED's brightness. It is somehow equivalent to an analogue output control.
The ATMega328P has 3 timers: one is high-precision (16-bit), and two are low-precision (8-bit). Details are presented in the table 8.
The timer counts “ticks”, where a “tick” can come either directly as a clock cycle (16 MHz) or comes through a prescaler to “slow it down”. Timers 0 and 1 share a common prescaler, and Timer 2 has an independent prescaler with more granularity. See table 8 for a list of valid prescalers for each timer, and table 9 for the frequency and period values for each prescaler. The general formula for timer speed is given by the following equation (figure 13):
Additionally, timer 2 has an extra feature: instead of using the internal clock, it can be clocked from an external 32768 kHz crystal oscillator and thus can work as an RTC.
| Timer | Size | Channels & Pins (PWM) | Valid prescallers | Common Uses |
|---|---|---|---|---|
| Timer 0 | 8-bit | Ch A: Pin 6, Ch B: Pin 5 | 1, 8, 64, 256, 1024 | Used by Arduino for millis() and delay(). |
| Timer 1 | 16-bit | Ch A: Pin 9, Ch B: Pin 10 | 1, 8, 64, 256, 1024 | High precision, long intervals, Servo control. |
| Timer 2 | 8-bit | Ch A: Pin 11, Ch B: Pin 3 | 1, 8, 32, 64, 128, 256, 1024 | Audio (tone) generation, Real-Time Clocks. |
| Prescaler | Frequency | Period (Tick Speed) |
|---|---|---|
| 1 | 16 MHz | 0.0625 µs |
| 8 | 2 MHz | 0.5 µs |
| 32 (Timer 2 only) | 500 kHz | 2.0 µs |
| 64 | 250 kHz | 4.0 µs |
| 128 (Timer 2 only) | 125 kHz | 8.0 µs |
| 256 | 62.5 kHz | 16.0 µs |
| 1024 | 15.625 kHz | 64.0 µs |
The frequency is commonly represented as the number of ticks the timer counts per cycle and is referred to as the TOP value.
Each timer in the ATmega328P has 2 channels: A and B. Channels are hardwired to GPIO pins, and you cannot change their assignments. Channels share the same base frequency, but the duty cycle can be controlled separately for each channel.
TOP = 250000/50 = 5000 ticks.
Each timer has a number of registers, named “The Big Five”. Timer applications go far beyond generating a PWM signal and thus have complex configuration settings, but here we focus only on the PWM application and the use of Timer1. Note, however, that other timers (Timer0 and Timer2) have similar functions, composition and control, differ, e.g. in a number of registers, because in Timer1 you need to use two 8-bit registers (High part and Low part of the value) for each related setting, while in Timer0 and Timer2, you use just one 8-bit register. In the table 10, there is a list of registers, along with their purposes and meanings, and it is further explained below.
| Register Name | Size | Full Name | Role | Meaning / Purpose |
|---|---|---|---|---|
| TCCR1A / TCCR1B | 8-bit (each) | Timer/Counter Control Register A & B | The Manager | Sets Mode, Pin behaviour, and Prescaler. |
| TCNT1 (H/L) | 16-bit | Timer/Counter Register 1 | The Stopwatch | Holds the actual live count (0 to TOP). |
| OCR1A (H/L) | 16-bit | Output Compare Register A | The Trigger | Defines the Duty Cycle (when the pin toggles). |
| ICR1 (H/L) | 16-bit | Input Capture Register 1 | The Ceiling | Defines the Frequency (the TOP value). |
| TIMSK1 / TIFR1 | 8-bit (each) | Timer Interrupt Mask & Flag Register | Notification | Handles Interrupts and status flags. |
The Manager
Those registers control timer behaviour and functions (refer to table 11:
The Stopwatch
This is for reading; it represents the timer's current “value”. Note that it may change very quickly (and asynchronously with the main code), but it is possible to write to it to enforce a cycle change, e.g., to perform a synchronisation.
As Timer1 is 16-bit, there are two registers, representing the upper (TCNT1H) and lower (TCNT1L) parts of the 16-bit value. Timer0 and Timer2, being 8-bit timers, have only a single Stopwatch register (TCNT0 and TCNT2, respectively).
The Trigger
Those registers store comparator values (values to compare against a Stopwatch). Again, for Timer0 and Timer2, there is one per timer, per channel (so 2 per timer: one for channel A and one for channel B); for Timer1, there are two per channel. E.g. for Timer1, channel A, register names are OCR1AH - the high part of a 16-bit value to compare Stopwatch against and OCR1AL to store the lower part. For channel B, those are OCR1BH and OCR1BL, respectively.
The Ceiling (TOP)
The TOP registers (also referred to as the Input Capture Register or Ceiling) define the maximum “capacity” of the Stopwatch register and thus, define the frequency. The timer simply counts from 0 up to the TOP value, and when it reaches TOP, it resets to 0 on the next tick.
Again, there are two registers for Timer1 (ICR1H, ICR1L - the high and low parts, respectively) and one for each Timer0 and Timer2.
The Notification
These registers are to control the timer-based interrupt notification system. We do not use interrupts for PWM; therefore, this description is omitted.
| Register | Bit | Name | Value (Example) | Description |
|---|---|---|---|---|
| TCCR1A | 7 | COM1A1 | 1 | Compare Output Mode A bit 1: Set for Non-Inverting PWM. |
| 6 | COM1A0 | 0 | Compare Output Mode A bit 0: Combined with bit 7 to control Pin 9. | |
| 5 | COM1B1 | 0 | Compare Output Mode B bit 1: Controls Pin 10 behaviour. | |
| 4 | COM1B0 | 0 | Compare Output Mode B bit 0: Combined with bit 5 to control Pin 10. | |
| 3 | - | 0 | Reserved: Always write to 0. | |
| 2 | - | 0 | Reserved: Always write to 0. | |
| 1 | WGM11 | 1 | Waveform Generation Mode bit 1: Part of Mode 14 selection. | |
| 0 | WGM10 | 0 | Waveform Generation Mode bit 0: Part of Mode 14 selection. | |
| TCCR1B | 7 | ICNC1 | 0 | Input Capture Noise Canceler: 1 enables a noise filter (used for sensors). |
| 6 | ICES1 | 0 | Input Capture Edge Select: Selects trigger edge for capture (rising/falling). | |
| 5 | - | 0 | Reserved: Always write to 0. | |
| 4 | WGM13 | 1 | Waveform Generation Mode bit 3: Part of Mode 14 selection. | |
| 3 | WGM12 | 1 | Waveform Generation Mode bit 2: Part of Mode 14 selection. | |
| 2 | CS12 | 0 | Clock Select bit 2: High bit of the Prescaler (gearbox). | |
| 1 | CS11 | 1 | Clock Select bit 1: Middle bit of the Prescaler. | |
| 0 | CS10 | 1 | Clock Select bit 0: Low bit of the Prescaler. |
Bits WGM13, WGM12, WGM11 and WGM10 are to be analysed together: they form a 4-bit value representing a mode. Mode 14 is Fast PWM, so binary representation is 1,1,1,0 (WGM13, WGM12, WGM11, WGM10 respectively).
Bits CS define prescaler value as presented in table 12.
| CS12 | CS11 | CS10 | Prescaler (Gear) | Ticks per second (at 16MHz) | Description |
|---|---|---|---|---|---|
| 0 | 0 | 0 | No Clock | 0 | Timer is stopped (Off). |
| 0 | 0 | 1 | clk/1 | 16,000,000 | No division. 1 tick = 1 CPU cycle. |
| 0 | 1 | 0 | clk/8 | 2,000,000 | Timer ticks once every 8 CPU cycles. |
| 0 | 1 | 1 | clk/64 | 250,000 | Our choice for 50Hz. |
| 1 | 0 | 0 | clk/256 | 62,500 | Used for medium-speed pulses. |
| 1 | 0 | 1 | clk/1024 | 15,625 | Used for very slow events or long delays. |
| 1 | 1 | 0 | External T1 | N/A | Timer ticks on a falling edge of Pin D5. |
| 1 | 1 | 1 | External T1 | N/A | Timer ticks on a rising edge of Pin D5. |
To refer to the registers from the assembler code level, it is necessary to use their numbers. It is, however, more convenient to use register literals. A full list of timer-related registers is presented in the table 13.
| Timer | Register | Address | Brief Description |
|---|---|---|---|
| Timer 0 (8-bit) | TCCR0A | 0x44 | Control Reg A: Sets PWM mode and Pin behaviour. |
| TCCR0B | 0x45 | Control Reg B: Sets Prescaler (the gearbox). | |
| TCNT0 | 0x46 | Stopwatch: The actual 8-bit live count. | |
| OCR0A | 0x47 | Trigger A: Duty Cycle for Pin 6. | |
| OCR0B | 0x48 | Trigger B: Duty Cycle for Pin 5. | |
| TIMSK0 | 0x6E | Interrupt Mask: Enables timer-specific alarms. | |
| TIFR0 | 0x35 | Interrupt Flag: Shows if a timer event occurred. | |
| Timer 1 (16-bit) | TCCR1A | 0x80 | Control Reg A: Mode and Pin behaviour (Ch A & B). |
| TCCR1B | 0x81 | Control Reg B: Mode and Prescaler. | |
| TCCR1C | 0x82 | Control Reg C: Force Output Compare bits. | |
| TCNT1H | 0x85 | Stopwatch High: Bits 8-15 of the count. | |
| TCNT1L | 0x84 | Stopwatch Low: Bits 0-7 of the count. | |
| ICR1H | 0x87 | Ceiling High: Bits 8-15 of the frequency TOP. | |
| ICR1L | 0x86 | Ceiling Low: Bits 0-7 of the frequency TOP. | |
| OCR1AH | 0x89 | Trigger A High: Bits 8-15 of Duty Cycle Pin 9. | |
| OCR1AL | 0x88 | Trigger A Low: Bits 0-7 of Duty Cycle Pin 9. | |
| OCR1BH | 0x8B | Trigger B High: Bits 8-15 of Duty Cycle Pin 10. | |
| OCR1BL | 0x8A | Trigger B Low: Bits 0-7 of Duty Cycle Pin 10. | |
| TIMSK1 | 0x6F | Interrupt Mask: Enables Timer 1 alarms. | |
| TIFR1 | 0x36 | Interrupt Flag: Shows Timer 1 status/events. | |
| Timer 2 (8-bit) | TCCR2A | 0xB0 | Control Reg A: Mode and Pin behaviour. |
| TCCR2B | 0xB1 | Control Reg B: Prescaler and Mode bits. | |
| TCNT2 | 0xB2 | Stopwatch: The actual 8-bit live count. | |
| OCR2A | 0xB3 | Trigger A: Duty Cycle for Pin 11. | |
| OCR2B | 0xB4 | Trigger B: Duty Cycle for Pin 3. | |
| ASSR | 0xB6 | Asynchronous Status: Used for 32kHz watch crystals. | |
| TIMSK2 | 0x70 | Interrupt Mask: Enables Timer 2 alarms. | |
| TIFR2 | 0x37 | Interrupt Flag: Shows Timer 2 status/events. | |
| System | GTCCR | 0x43 | General Timer Control: Syncs/Resets all timers. |
To use timers for PWM generation, one must configure the following (in order):
Example for the use of timers
The example below implements a standard servo PWM signal (50Hz) with a 10% duty cycle:
The code contains only a minimal set of register declarations used to control Timer1 for PWM. Note that in the code below, the timer, once configured, generates a PWM signal independently of CPU work. In the final loop, the CPU is doing nothing, just the dummy loop. All logic is controlled solely by a timer, asynchronously and externally to the code. The configuration process is presented in the figure 14.
/* * ATmega328P 50Hz PWM via Timer 1 * No includes - Manual Address Mapping */ /* Register Addresses */ .equ DDRB, 0x24 /* Port B Direction Register */ .equ TCCR1A, 0x80 /* Control Register A */ .equ TCCR1B, 0x81 /* Control Register B */ .equ ICR1H, 0x87 /* TOP Value (High) */ .equ ICR1L, 0x86 /* TOP Value (Low) */ .equ OCR1AH, 0x89 /* Duty Cycle (High) */ .equ OCR1AL, 0x88 /* Duty Cycle (Low) */ .org 0x0000 rjmp reset reset:
Configure PIN9 (Timer1, channel A) as output.
; Configure PIN 9 as output (Timer1, channel A) ldi r16, (1 << 1) sts DDRB, r16
Preconfigure the TOP (register) of Timer1 to count from 0 to 4999 (0x1387), so it provides 5000 ticks per 20ms (50Hz) with a prescaler of 64.
; Set frequency to 50Hz ; Prescaler is 64, ICR1 (TOP) is set to 4999d=0x1387 ldi r16, 0x13 ; High byte of 4999 sts ICR1H, r16 ldi r16, 0x87 ; Low byte of 4999 sts ICR1L, r16
Preconfigure the trigger (comparator) so it flips the output on GPIO 9 when only the TOP reaches 500 (0x01F4), which is equivalent to 2ms (500 is 10% of 5000). The Timer1 instantly compares the TOP register with this trigger, and when the level of 500 is reached, it switches the output from 1 to 0. The other switch is handled automatically by Timer1 on TOP overflow.
; Set triggers (comparators) to 10% of TOP ; 500d=0x01F4 to OCR1A ldi r16, 0x01 ; High byte of 500 sts OCR1AH, r16 ldi r16, 0xF4 ; Low byte of 500 sts OCR1AL, r16
Configure Timer1 to work in Mode 14 (Fast PWM, cyclical square wave with controllable duty cycle via triggers/comparators).
; Set timer to operate as Fast PWM (Mode 14): ; Mode 14 -> WGM = 1110b=14d ; COM1A1 = 1 (Clear Pin on Match - Non-Inverting) ldi r16, (1 << 7) | (1 << 1) sts TCCR1A, r16
Set prescaler to 64 - it automatically starts Timer1
; Start timer with prescaler=64 ; WGM13=1, WGM12=1, CS11=1, CS10=1 ldi r16, (1 << 4) | (1 << 3) | (1 << 1) | (1 << 0) sts TCCR1B, r16
And then do nothing: this loop is a dummy; all work is handled by Timer1. CPU is ready to handle something else.
loop: rjmp loop ; The CPU does nothing! ; The Timer1 hardware toggles the pin forever. ; It is done asynchronously to the main code!
When connecting an oscilloscope to GPIO pin 9, the result is as presented in figure 15.
Timers are not only used to generate a periodic signal but may also execute code at precisely timed intervals. Those are Interrupts: code run by the timer at predefined intervals.
The two most common cases are:
In the example provided below, Timer1 (16-bit), operating in CTC mode, will be used to run an interrupt that toggles the built-in LED every 1 second.
Timer1 runs here with a prescaler of 1024, giving 15625 ticks per second. Note, we count from 0 to 15624, and 15624 is represented in hex as 0x3D08 (COMP_VAL_H and COMP_VAL_L).
To use interrupts, one should refer to the ATMega328P documentation ATMega328P Datasheet - it represents a list of all interrupts and the location of the addresses of handlers (interrupt vectors) in the memory. We're using an interrupt triggered when the ticker counter on Timer1 hits a TOP value: interrupt 12, address 0x0016 (in words), “COMPA Timer/Counter1 compare match A”.
.org 0x002C).
We also need to prepare and initialise a stack, growing from the end of the memory, towards lower addresses (refer to figure 9):
; --- Register Address Mapping --- .equ SREG, 0x3F ; Status Register .equ SPH, 0x3E ; Stack Pointer High .equ SPL, 0x3D ; Stack Pointer Low .equ DDRB, 0x04 ; Data Direction Port B .equ PINB, 0x03 ; Input Pins Port B (Toggle Shortcut) .equ PB5, 5 ; Built-in LED (Digital 13 / PB5) .equ RAMEND, 0x08FF ; End of SRAM ; --- Timer1 Register Mapping --- .equ TCCR1A, 0x80 ; Timer1 Control A .equ TCCR1B, 0x81 ; Timer1 Control B .equ OCR1AH, 0x89 ; Output Compare High .equ OCR1AL, 0x88 ; Output Compare Low .equ TIMSK1, 0x6F ; Timer1 Interrupt Mask .equ TOP_VAL, 0x3D08 ; TOP register value 15624dec
This section is located in flash: there is an array of interrupt vectors that starts at address 0x0000, where program execution begins on reset, and extends to 0x0033 (26 vectors). Here, interrupt 0x0016 is used (in bytes, the vector starts at address 0x002C).
.section .text .org 0x0000 rjmp RESET .org 0x002C ; Assembler treats .org as bytes, use 0x002C rjmp TIMER_ISR .org 0x0034 ; Jump past vectors to start logic
Code uses calls, so the stack is obligatory. It starts by the end of the SRAM (0x08FF). It is also necessary to initialise GPIO 13 (LED, PB5) pin as an output prior to controlling it with an ISR.
RESET: ; Prepare stack ldi r16, hi8(RAMEND) out SPH, r16 ldi r16, lo8(RAMEND) out SPL, r16 ; Configure PB5 (built-in LED, GPIO13) as Output sbi DDRB, LED_PIN
Timer1 configuration is presented below: CTC operation mode, with the TOP register (OCR1A) set to 0x3D08. Control register TCCR1A has all bits set to low (default), and TCCR1B sets CTC enabling mode and the prescaler equal to 1024 (CS bits set to 101 binary). Then enable interrupts with sti and fall into a blind loop. All actions are executed asynchronously.
; Set Timer1 in CTC Mode ; Load the compare value for 1 second ldi r16, hi8(TOP_VAL) sts OCR1AH, r16 ldi r16, lo8(TOP_VAL) sts OCR1AL, r16 ; TCCR1A: Default (0x00) ldi r16, 0x00 sts TCCR1A, r16 ; TCCR1B: ; Bit 3 (WGM12) = 1 (CTC Mode) ; Bit 2 (CS12) = 1 (Prescaler 1024) ; Bit 0 (CS10) = 1 (Prescaler 1024) ldi r16, (1 << 3) | (1 << 2) | (1 << 0) sts TCCR1B, r16 ; Enable Compare Match A Interrupt ldi r16, (1 << 1) ; OCIE1A bit sts TIMSK1, r16 ; Enable Interrupts sei LOOP: rjmp LOOP ; It does nothing!
The TIMER_ISR routine is executed by Timer1. The pointer to this code is located in the interrupt vector table, at address 0x002C.
sbi instruction causes the GPIO output to flip, without the need to do a classical READ→COMPLEMENT→WRITE algorithm.
; Interrupt subroutine, called by Timer1 TIMER_ISR: sbi PINB, PB5 reti
Reading from the analogue input is not as straightforward as with digital inputs.
Built-in ADC converter uses 10-bit resolution, has 6 channels (A0-A5, respectively). It also uses a reference voltage (configurable) as 5V (power source), internal 1.1V source or external reference voltage, connected to Aref input pin.
Inputs are connected to the ADC through the multiplexer, so only one input can be serviced at a time (the ADC has only one channel). Switching inputs may render the first reading invalid due to the measurement method.
The low-level ADC register-based operations use the following formula to obtain an ADC value (figure 16, based on the input value Vgpio and the reference value Vref).
Technically, inside ADC, there is a 34 pF capacitor that loads the input and discharges. For this reason, measuring high impedance can yield inaccurate readings, so the first ADC reading is commonly discarded, and the best practice is to take multiple measurements and calculate an average.
From the assembler developer's point of view, it is more important that the ADC readings is a value between 0 and 1023 (10-bit resolution), and to convert the ADC reading to the measured voltage on ADC input (Vinput), the following formula (figure 17) is valid:
Vref: reference voltage. So one needs to know the current configuration is for ADC: whether Vref is power 5V, internal 1.1V, or an external provided with the use of the Aref pin, and pass the appropriate argument to the equation.
Analogue reading uses a complex setup of ADC-related registers as presented in table 14. ADC has a number of registers mapped to a memory area and accessible using the lds and sts instructions.
| Register (Address) | Bit | Name | Description |
|---|---|---|---|
| ADMUX (0x7C) | 7 | REFS1 | Reference Selection Bit 1 |
| 6 | REFS0 | Reference Selection Bit 0 (01 = AVcc) | |
| 5 | ADLAR | Left Adjust Result (1 = Left, 0 = Right) | |
| 4 | - | Reserved | |
| 3 | MUX3 | Analog Channel Selection Bit 3 | |
| 2 | MUX2 | Analog Channel Selection Bit 2 | |
| 1 | MUX1 | Analog Channel Selection Bit 1 | |
| 0 | MUX0 | Analog Channel Selection Bit 0 (0000 = A0) | |
| ADCSRA (0x7A) | 7 | ADEN | ADC Enable (Must be 1) |
| 6 | ADSC | ADC Start Conversion (Write 1 to start) | |
| 5 | ADATE | ADC Auto Trigger Enable | |
| 4 | ADIF | ADC Interrupt Flag | |
| 3 | ADIE | ADC Interrupt Enable | |
| 2 | ADPS2 | ADC Prescaler Select Bit 2 | |
| 1 | ADPS1 | ADC Prescaler Select Bit 1 | |
| 0 | ADPS0 | ADC Prescaler Select Bit 0 (111 = by 128) | |
| ADCSRB (0x7B) | 7 | - | Reserved |
| 6 | ACME | Analog Comparator Multiplexer Enable | |
| 5 | - | Reserved | |
| 4 | - | Reserved | |
| 3 | - | Reserved | |
| 2 | ADTS2 | ADC Auto Trigger Source Bit 2 | |
| 1 | ADTS1 | ADC Auto Trigger Source Bit 1 | |
| 0 | ADTS0 | ADC Auto Trigger Source Bit 0 | |
| ADCH (0x78) | 15..8 | ADC[9:0] | 10-bit Result (ADCL first, then ADCH) |
| ADCL (0x79) | 7..0 | ||
| DIDR0 (0x7E) | 5:0 | ADC5D:ADC0D | Digital Input Disable (1 = Disable Buffer) |
An algorithm for reading an analogue value from a selected input is implemented as follows:
The sample code below configures the ADC, reads from the A0 input, and stores the value as a 16-bit value in the adc_storage variable.
; --- Register Definitions (ATmega328P) --- .equ ADCL, 0x78 .equ ADCH, 0x79 .equ ADCSRA, 0x7A .equ ADCSRB, 0x7B .equ ADMUX, 0x7C .equ DIDR0, 0x7E ; --- Bit Definitions --- .equ REFS0, 6 ; Reference selection bit 0 .equ ADEN, 7 ; ADC Enable .equ ADSC, 6 ; ADC Start Conversion .equ ADPS2, 2 ; Prescaler bit 2 .equ ADPS1, 1 ; Prescaler bit 1 .equ ADPS0, 0 ; Prescaler bit 0 ; --- Data Segment --- .section .data .org 0x0100 adc_storage: .byte 2 ; Reserve 2 bytes in SRAM for the 10-bit result
Now the setup part: connect A0 to the ADC via a multiplexer, select the reference voltage as the power supply (5V), and set the conversion sampling speed using a prescaler (128, which gives 125kHz). Also, disable A0 as a digital GPIO to save energy and lower noise.
; --- Code Segment --- .section .text .global main main: ; setup multiplexer to use A0 (0000) ; and AVcc (powering voltage +5V) as reference ; It is done with the ADMUX register ldi r24, (1 << REFS0) sts ADMUX, r24 ; setup prescaler and enable ADC. ; prescaler is 128 (16MHz/128 = 125kHz) ldi r24, (1 << ADEN) | (1 << ADPS2) | (1 << ADPS1) | (1 << ADPS0) sts ADCSRA, r24 ; and disable A0 GPIO as a digital input (analogue still works) ; good practice ldi r24, 0x01 sts DIDR0, r24
Start conversion by setting the ADSC bit (6) of ADSCRA to 1. ADC requires some time to read the value and complete the conversion (it is based on a capacitor); thus, when it is ready to read, the ADSC bit is cleared by the ADC hardware. Here, we do not use any interrupts, just dummy pulling.
loop: ; start ADC conversion lds r24, ADCSRA ori r24, (1 << ADSC) sts ADCSRA, r24 wait_adc: ; pull ADSC bit ; when conversion is ready, ADC clears this bit lds r24, ADCSRA sbrc r24, ADSC rjmp wait_adc
The converted value is stored in the ADCL and ADCH registers. And it is crucial to keep the reading order: low byte first, then high.
; read conversion result ; IMPORTANT: Read Low Byte first to lock the values lds r18, ADCL ; r18 = Low Byte lds r19, ADCH ; r19 = High Byte ; save to memory ldi r26, lo8(adc_storage) ldi r27, hi8(adc_storage) st X+, r18 ; Store low byte st X, r19 ; Store high byte rjmp loop ; Repeat indefinitely
Speed vs Quality ADC converts an analogue value to its digital representation using a capacitor. Charging and discharging of the capacitor require time and depend on the impedance of the analogue signal's input source. The general rule says that the faster the conversion, the lower the quality and the higher the error ratio. Conversion speed can be controlled using a prescaler value (bits ADPS2, ADPS1, and ADPS0 of the ADCSRA register). The prescaler divides the clock frequency (16MHz) to slow down the conversion process. Prescaler value and related conversion speed and time is presented in table 15.
| ADPS2 | ADPS1 | ADPS0 | Division Factor | ADC Clock (16 MHz) | Clock Period (1/f) |
|---|---|---|---|---|---|
| 0 | 0 | 0 | 2 | 8 MHz | 0.125 µs |
| 0 | 0 | 1 | 2 | 8 MHz | 0.125 µs |
| 0 | 1 | 0 | 4 | 4 MHz | 0.25 µs |
| 0 | 1 | 1 | 8 | 2 MHz | 0.5 µs |
| 1 | 0 | 0 | 16 | 1 MHz | 1.0 µs |
| 1 | 0 | 1 | 32 | 500 kHz | 2.0 µs |
| 1 | 1 | 0 | 64 | 250 kHz | 4.0 µs |
| 1 | 1 | 1 | 128 | 125 kHz | 8.0 µs |
The technical recommendation is to use up to 250kHz. Faster conversions will bring poor quality.
While programming in pure assembler, you can freely define your rules for calling conventions. But if your code is supposed to be modular and eventually used by others (e.g. a call from C++), it is good to follow standards. For AVR GCC, there is an Application Binary Interface (ABI) standard. All registers are 8-bit; to pass and return data larger than a byte (16, 32, or 64 bits), groups of registers must be used. Below there is a comprehensive guide on how to pass arguments.
General-purpose registers (R0 to R31) are divided into sections to prevent data loss and misunderstandings.
The following rules should be applied to keep consistency with the AVR GCC ABI (also refer to figure 18):
Example functions: convert unsigned int to ASCII
The function below fulfils the AVR GCC ABI standards and converts an unsigned int value (16-bit) to an ASCII array of characters. Note that the array is provided externally to the function, so it must be allocated at the caller level.
/* * void uint16_to_ascii(uint16_t value, char* str) * r25:r24 = value (unsigned) * r23:r22 = str (pointer), at least 6/8 bytes! */ .global uint16_to_ascii uint16_to_ascii: movw r30, r22 ; Move destination pointer to Z register (r31:r30) ; It also copies r23 to r31 ldi r19, 0 ; Digit counter push r19 ; Push null-terminator (0) onto stack first /* Uncomment this section to have CR LF added. * The minimum buffer size then needs to be 8, not 6 bytes. * ldi r19, 10 * push r19 ; Push 10 (LF) * ldi r19, 13 * push r19 ; Push 13 (CR) */ .L_divide_loop: ; Divide r25:r24 by 10 using shift-and-subtract ldi r18, 0 ; r18 will hold the remainder (the digit) ldi r20, 16 ; Loop for 16 bits .L_div_bit_loop: lsl r24 ; Shift value left (MSB goes into Carry) rol r25 rol r18 ; Rotate Carry into remainder cpi r18, 10 ; Compare remainder with 10 brlo .L_skip_sub subi r18, 10 ; Remainder = Remainder - 10 inc r24 ; Set the lowest bit of the quotient .L_skip_sub: dec r20 brne .L_div_bit_loop ; r18 now has the digit (0-9) subi r18, -'0' ; Convert to ASCII (add 0x30) push r18 ; Save digit on stack to reverse order ; Check if the quotient (r25:r24) is zero sbiw r24, 0 brne .L_divide_loop ; If quotient != 0, get the next digit .L_pop_and_store: pop r18 ; Pull digit (or null terminator) from stack st Z+, r18 ; Store into the array and increment pointer tst r18 ; Check if we just stored the 0 terminator brne .L_pop_and_store ; If not zero, keep popping ret ; Return to caller
Sample code using the function above (note, the function is NOT included here; you have to add it yourself) is presented below:
.section .data ; 1. Declaration of a buffer for a string in RAM my_string_buffer: .byte 6 .section .text .global main main: ; 2. Load constant value (e.g., 12345) into r25:r24 ldi r24, lo8(12345) ldi r25, hi8(12345) ; 3. Load buffer pointer into r23:r22 ldi r22, lo8(my_string_buffer) ldi r23, hi8(my_string_buffer) ; Call the function call uint16_to_ascii loop: rjmp loop
Example function: send string buffer (SRAM) to serial port
The function below sends a null-terminated string to the serial port at 9600 bps, 8N1.
It does not send “end of line” (codes 13,10).
It works ONLY with ASCII strings stored in SRAM (defined in .data section). To make it work with strings stored in flash, see the warning note below the code.
/* * void uart_send_string(char* str) * r25:r24 = str (pointer), to SRAM! */ ; Register Addresses (ATmega328P) .equ UCSR0A, 0xC0 .equ UCSR0B, 0xC1 .equ UCSR0C, 0xC2 .equ UBRR0L, 0xC4 .equ UBRR0H, 0xC5 .equ UDR0, 0xC6 ; Bit Positions .equ TXEN0, 3 .equ UDRE0, 5 .equ UCSZ01, 2 .equ UCSZ00, 1 uart_send_string: ; Argument 1 (Pointer to buffer) is passed in r25:r24 ; Explicitly loading parts into the Z register (r31:r30) mov r30, r24 mov r31, r25 ; --- UART Initialization --- ; 1. Set Baud Rate: 9600 bps @ 16MHz (UBRR = 103) ldi r18, lo8(103) sts UBRR0L, r18 ldi r18, hi8(103) sts UBRR0H, r18 ; 2. Configure UART for 8N1 (8 data bits, no parity, 1 stop bit) ldi r18, (1<<UCSZ01) | (1<<UCSZ00) sts UCSR0C, r18 ; 3. Enable UART Transmitter ldi r18, (1<<TXEN0) sts UCSR0B, r18 .L_send_loop: ld r18, Z+ ; Load byte from string and increment pointer tst r18 ; Check if we hit the 0 (null terminator) breq .L_done ; End of string reached .L_wait_tx: ; Check the UART Status Register A lds r19, UCSR0A sbrs r19, UDRE0 ; Skip if UDRE0 is 1 (Buffer is ready) rjmp .L_wait_tx ; Wait until buffer is empty ; Move character to Data Register to start transmission sts UDR0, r18 rjmp .L_send_loop .L_done: ret
ld r18, Z+ → lpm r18, Z+.
Delay (in ms)
This function implements the classical, blocking “delay” function, using timers. It uses Timer0 with a prescaler of 64. Then Timer0 increments at 16MHz/64=250kHz → each Timer0 tick lasts 4us. So 1ms then requires 1000/4=250 ticks. Then it simply repeats, waiting for 1ms, the number of times specified by the 16-bit argument (R25:R24).
/* * void delay_ms(uint16_t ms) * r25:r24 = delay in ms (16-bit) */ ; Register Addresses (ATmega328P) .equ TCCR0A, 0x44 .equ TCCR0B, 0x45 .equ TCNT0, 0x46 .equ TIFR0, 0x35 ; Bit Positions .equ OCF0A, 1 ; Output Compare Flag 0 A delay_ms: ; If the input is 0, return immediately sbiw r24, 0 breq .L_passed ; --- Timer0 Setup --- ; Mode: Normal. Prescaler: 64 ldi r18, (1<<1) | (1<<0) sts TCCR0B, r18 .L_outer_loop: ldi r18, 0 sts TCNT0, r18 ; Reset timer count to 0 ; Clear the Compare Match Flag by writing a 1 to it (AVR quirk) ldi r18, (1<<OCF0A) out TIFR0, r18 .L_wait_1ms: lds r18, TCNT0 cpi r18, 250 ; 250 ticks * 4us = 1000us = 1ms brlo .L_wait_1ms ; Keep polling until 250 reached sbiw r24, 1 ; Decrement ms counter (r25:r24) brne .L_outer_loop ; If not zero, run another 1ms ; --- Cleanup --- ldi r18, 0 sts TCCR0B, r18 ; Stop the timer to save power .L_passed: ret
Each laboratory node is equipped with an Arduino Uno R3 development board, based on the ATmega328P MCU. It also has two extension boards:
There are 10 laboratory nodes. They can be used independently, but for collaboration, nodes are interconnected symmetrically, with GPIOs described in the hardware reference section below.
The table 16 lists all hardware components and details. Note that some elements are accessible, but their use is not supported via the remote lab, e.g., buttons and a buzzer.
The node is depicted in the figure 19 and its interface visual schematic is presented in the figure 20. The schematic presents only components used in scenarios and accessible via the VREL NextGen environment (controllable and observable via video stream), omitting unused components such as buttons, a buzzer, and a potentiometer.
| Component ID | Component | Hardware Details (controller) | Control method | GPIOs (as mapped to the Arduno Uno) | Remarks |
|---|---|---|---|---|---|
| D1 | LED (red) | direct via GPIO | binary (0→on, 1→off) | GPIO13 | |
| D2 | LED (red) | direct via GPIO | binary (0→on, 1→off) | GPIO12 | |
| D3 | LED (red) | direct via GPIO | binary (0→on, 1→off) | GPIO11 | |
| D4 | LED (red) | direct via GPIO | binary (0→on, 1→off) | GPIO10 | shared with interconnection with another module |
| LED4 | 4x 7-segment display(+DP) | indirect, via two 74HC575 registers | serial load to 2 registers, daisy-chained | GPIO8 - serial input of the controller (SER_PORT) GPIO7 - shift data internally (CLK_PIN), raising edge (write next bit and shift data in serial) GPIO4 - store data to internal buffer, in 74H575, stores only one digit(LAT_PIN) |
To display a digit in the 4x7seg. display, there are two definitions needed: the shape of a digit (or other symbol), and its position (1,2,3,4: a binary mask).
The 7-segment display is a common-anode (you use zero to turn the segment on), and thus 0..9 digit definitions are declared below:
; Common Anode 7-segment masks (Active LOW) ; Segments: DP,g,f,e,d,c,b,a (Bit 7 -> Bit 0) ; Indices: 0 1 2 3 4 5 6 7 8 9 segment_masks: .byte 0xC0, 0xF9, 0xA4, 0xB0, 0x99, 0x92, 0x82, 0xF8, 0x80, 0x90
In a common-anode configuration, the active signal to turn on a segment is LOW (0), and to turn it off, it is HIGH (1). The state of a single digit is represented by an 8-bit mask: 7 segments to build the symbol and a DP (decimal point). For example, a digit 7 is represented by bits corresponding to segments “a”, “b”, and “c” set to 0 (to turn segments “a”, “b”, and “c” on) and the remaining bits set to 1 (to turn them off), so the corresponding binary value looks as follows: 11111000b, hence the hexadecimal value is 0xF8 (as in the code above). The MSB bit represents DP, and the LSB segment “a”. This definition affects how one loads data into the shift register: starting from MSB towards LSB, because of the way the register is built and connected to the segments - refer to the function display_digit below.
The way the display works is similar to a typical matrix dot display: instead of having to control 32 independent digital lines to control each LED composing the display independently (8 per digit, 4 digits), we use a digit selector (lines 0,1,2,3) and common symbol lines (lines DP,g,f,e,d,c,b,a).
This way, the display “flashes” because, to display more than one digit, you need to iterate over the lines instantly and set the appropriate symbol definitions. However, the human eye is slow enough not to notice it, and thus we see all 4 digits in parallel, not being displayed one by one that in fact is a real scenario.
The schematic in Figure 19 shows an idea of how to control a single digit over a serial port pin (SER_PIN): you need to inject bit by bit, starting from the least significant bit of the symbol representing a digit, then 8 bits of the digit number - selected by lines 1,2,3,4, so only 0001b, 0010b, 0100b and 1000b combinations are used. A 0→1→0 pulse on the clock (SER_CLK) writes the data to the left registers and shifts the contents right (including passing from the left register to the right one). This way, after 16 cycles (8+8), the left register holds the line that selects the digit (1,2,3,4), and the right register holds the combination representing the symbol at this position.
When binary combinations in both registers (line and symbol) are ready to be represented, a LAT_PORT 0→1→0 pulse rewrites register counters to the internal buffer, and it instantly causes displays to light according to the symbol definition loaded into the right register (only current digit, others are off at this time).
Display single digit: function definition
To handle display, a sample function that displays a digit in a selected position is presented below. Note that it does not check parameters and thus assumes that the digit position is a number between 0 and 3, and that a digit to display is 0..9. Going beyond these limits causes unpredictable behaviour and usually an MCU program crash.
; Pin definitions using direct I/O addresses for ATmega328P .equ SER_PORT, 0x05 ; PORTB I/O address .equ SER_PIN, 0 .equ CLK_PORT, 0x0B ; PORTD I/O address .equ CLK_PIN, 7 .equ LAT_PORT, 0x0B ; PORTD I/O address .equ LAT_PIN, 4 .global display_digit ; void display_digit(uint8_t pos, uint8_t number); ; r24 = position (0 to 3) ; r22 = number (0 to 9) display_digit: push r16 push r17 push r18 push zl push zh ; 1. Load Segment Mask (for U3) from flash ldi zl, lo8(segment_masks) ldi zh, hi8(segment_masks) add zl, r22 ; Add number index to Z pointer adc zh, r1 ; r1 is assumed to be 0 (gcc standard) lpm r16, Z ; r16 now holds segment data ; 2. Load Digit Select Mask (for U2) from flash ldi zl, lo8(digit_masks) ldi zh, hi8(digit_masks) add zl, r24 ; Add position index to Z pointer adc zh, r1 lpm r17, Z ; r17 now holds digit select data ; 3. Shift out Segment Data (r16) -> Ends up in U3 ldi r18, 8 ; Loop counter for 8 bits shift_segments: lsl r16 ; Shift MSB into Carry flag brcs set_ser_seg ; If Carry is 1, branch to set SER high cbi SER_PORT, SER_PIN ; Clear SER low rjmp clock_seg set_ser_seg: sbi SER_PORT, SER_PIN ; Set SER high clock_seg: ; Pulse SRCLK sbi CLK_PORT, CLK_PIN cbi CLK_PORT, CLK_PIN dec r18 brne shift_segments ; 4. Shift out Digit Select Data (r17) -> Ends up in U2 ldi r18, 8 ; Loop counter for 8 bits shift_digits: lsl r17 ; Shift MSB into Carry flag brcs set_ser_dig cbi SER_PORT, SER_PIN rjmp clock_dig set_ser_dig: sbi SER_PORT, SER_PIN clock_dig: ; Pulse SRCLK sbi CLK_PORT, CLK_PIN cbi CLK_PORT, CLK_PIN dec r18 brne shift_digits ; 5. Pulse Latch (RCLK) to update the output displays sbi LAT_PORT, LAT_PIN cbi LAT_PORT, LAT_PIN pop zh pop zl pop r18 pop r17 pop r16 ret ; --------------------------------------------------------- ; Data stored in Program Memory (Flash) ; --------------------------------------------------------- .section .progmem.data, "a", @progbits ; Common Anode 7-segment masks (Active LOW) ; Segments: DP,g,f,e,d,c,b,a (Bit 7 -> Bit 0) ; Indices: 0 1 2 3 4 5 6 7 8 9 segment_masks: .byte 0xC0, 0xF9, 0xA4, 0xB0, 0x99, 0x92, 0x82, 0xF8, 0x80, 0x90 ; Digit select masks (Assuming active high on QA-QD for digits 1-4) digit_masks: .byte 0x01, 0x02, 0x04, 0x08
segment_masks enables you to easily present characters other than numbers. Think about segment_masks as a font definition that defines how a symbol looks.
Display single digit: how to use it to display a number?
Sample code that uses the function declared above and displays 1975 is presented below. Note, the MCU runs here at full speed, constantly updating the display. While it is not necessary to (a minimum, comfortable LED display refresh rate should be around 10Hz), we do not present such a solution here for the sake of simplicity. It is common to address timers for this job to periodically refresh the screen.
.equ SREG, 0x3F ; Status Register .equ SPH, 0x3E ; Stack Pointer High .equ SPL, 0x3D ; Stack Pointer Low .equ SER_PORT, 0x05 ; PORTB I/O address .equ PINB, 0x03 ; Input Pins Port B (Toggle Shortcut) .equ SER_PIN, 0 ; GPIO8 .equ DDRD, 0x0A ; Data Direction Port D .equ DDRB, 0x04 ; Data Direction Port B .equ CLK_PORT, 0x0B ; PORTD I/O address .equ CLK_PIN, 7 ; GPIO7 .equ LAT_PORT, 0x0B ; PORTD I/O address .equ LAT_PIN, 4 ; GPIO4 .equ RAMEND, 0x08FF .global display_digit ; --------------------------------------------------------- ; Data stored in Program Memory (Flash) ; --------------------------------------------------------- .section .text .org 0x0000 rjmp RESET RESET: ; Prepare stack ldi r16, hi8(RAMEND) out SPH, r16 ldi r16, lo8(RAMEND) out SPL, r16 ; Initialise display control outputs sbi DDRB, SER_PIN ; Set PB0 as output sbi DDRD, CLK_PIN ; Set PD7 as output sbi DDRD, LAT_PIN ; Set PD4 as output clr r25 clr r23 ; --- Main Loop, displays in sequence 1->9->7->5 --- LOOP: ldi r24,0 ldi r22,1 call display_digit ; Display 1 ldi r24,1 ldi r22,9 call display_digit ; Display 9 ldi r24,2 ldi r22,7 call display_digit ; Display 7 ldi r24,3 ldi r22,5 call display_digit ; Display 5 rjmp LOOP ; void display_digit(uint8_t pos, uint8_t number); ; r24 = position (0 to 3) ; r22 = number (0 to 9) .... here comes the body of the display_digit function
In the function above, we used fixed (constant) digits to display. A common scenario, however, is when the number is stored in some register or in a memory variable.
Convert number to digits: function definition
To display a number on this kind of display, you need to convert it into an array of bytes, each representing a digit. A function below does the trick.
; void convert_to_digits(uint16_t value, uint8_t* array); ; Inputs: ; r25:r24 = Value to convert (up to 9999) ; r23:r22 = Pointer to SRAM array (4 bytes long) convert_to_digits: ; Save registers we are about to use push r26 push r27 push r18 push r19 push r20 ; Move the SRAM pointer from r23:r22 into the X pointer (r27:r26) movw r26, r22 ; --------------------------------------------------- ; 1. Thousands Digit (Subtract 1000 = 0x03E8) ; --------------------------------------------------- clr r18 ; Clear digit counter ldi r19, 0x03 ; High byte of 1000 ldi r20, 0xE8 ; Low byte of 1000 loop_1000: cp r24, r20 ; Compare value low byte with 1000 low byte cpc r25, r19 ; Compare value high byte with 1000 high byte brlo done_1000 ; If value < 1000, branch out sub r24, r20 ; Subtract 1000 low byte sbc r25, r19 ; Subtract 1000 high byte (with carry) inc r18 ; Increment thousands digit rjmp loop_1000 done_1000: st X+, r18 ; Store thousands digit in array[0] and increment X ; --------------------------------------------------- ; 2. Hundreds Digit (Subtract 100 = 0x0064) ; --------------------------------------------------- clr r18 ; Reset digit counter ldi r19, 0x00 ; High byte of 100 ldi r20, 0x64 ; Low byte of 100 loop_100: cp r24, r20 cpc r25, r19 brlo done_100 sub r24, r20 sbc r25, r19 inc r18 rjmp loop_100 done_100: st X+, r18 ; Store hundreds digit in array[1] and increment X ; --------------------------------------------------- ; 3. Tens Digit (Subtract 10 = 0x000A) ; --------------------------------------------------- clr r18 ; Reset digit counter ldi r19, 0x00 ; High byte of 10 ldi r20, 0x0A ; Low byte of 10 loop_10: cp r24, r20 cpc r25, r19 brlo done_10 sub r24, r20 sbc r25, r19 inc r18 rjmp loop_10 done_10: st X+, r18 ; Store tens digit in array[2] and increment X ; --------------------------------------------------- ; 4. Ones Digit (The Remainder) ; --------------------------------------------------- ; Whatever is left in r24 is the ones digit (0-9) st X, r24 ; Store ones digit in array[3] (no need to increment X) ; Restore registers and return pop r20 pop r19 pop r18 pop r27 pop r26 ret
Note, this function operates on a buffer located in the memory, which can be declared, e.g. as follows:
.section .bss ; .bss is for uninitialized variables in SRAM ; Reserve 4 bytes in SRAM to hold the 4 converted digits display_array: .space 4
Devices (laboratory nodes) are interconnected in pairs, so it is possible to work in groups and implement scenarios involving more than one device:
Interconnections are symmetrical, so that device 1 can send data to device 2 and vice versa (similar to serial communication). Note that analogue inputs are also involved in the interconnection interface. See image 21 for details.
The in-series resistors protect the Arduino boards' outputs from excessive current when both pins are configured as outputs with opposite logic states.
The capacitors on the analogue lines filter the PWM signal, providing a stable voltage for the analogue-to-digital converter to measure.
| Arduino Uno pin name | AVR pin name | Alternate function | Comment |
|---|---|---|---|
| D2 | PD2 | INT0 | Interrupt input |
| D5 | PD5 | T1 | Timer/counter input |
| D6 | PD6 | OC0A | PWM output to generate analogue voltage |
| D9 | PB1 | OC1A | Digital output / Timer output |
| D10 | PB2 | OC1B | Digital output / Timer output |
| A5 | PC5 | ADC5 | Analogue input |
Such a connection makes it possible to implement a variety of scenarios:
Below are hands-on lab scenarios intended for use with the VREL NextGen system (access via a browser; no need to install the toolchain or any other software).
— MISSING PAGE —
In this section, we will show some examples of programs written purely in assembler or in connection with other programming languages, including C++ and C#. We assume that the reader is familiar with the coursebook, instructions and directives used to write the assembler programs. We will describe the use of the integrated development environment (Visual Studio) and methods to assemble programs with the command line only.We will also show how to create the static and dynamic library written in assembler for use in assembler or in other compilers.
In the following chapter, we explain how to write, assemble, link and execute programs written in assembly language for x64 processors. We assume that the reader is familiar with the most important processor instructions and MASM directives.
Creating a project in VS with MASM source. Assembling, debugging, disassembly window, register view, memory view - data section,
[piotr] TO BE DONE
It is possible to use command-line MASM tools to assemble, link, and create libraries written in assembly language. You can use any editor to create the assembler source code and translate it into machine code. The tools required are integral elements of the Visual Studio Community installation, installed with the option “Desktop development with C++”. For the default VS installation, you can find them in the following folder (it can change due to different version numbers).
C:\Program Files\Microsoft Visual Studio\18\Community\VS\Tools\MSVC\14.50.35717\bin\Hostx64\x64
To use statically included Windows libraries, you need lib files. The essential library is kernel32.lib, but for other Windows functions, you will also need some additional libraries. All are available in the following folder (it can change due to different version numbers).
C:\Program FIles (x86)\Windows Kits\10\Lib\10.0.26100.0\um\x64
For assembling the source file, the ML64.exe program is used. This program has many options, which you can see executing:
ML64.exe /?
After assembling, ML64 can call the linker automatically. An exemplary MASM execution command to assemble and link the file named source.asm can look like this:
ml64 /Fl /Zi /Zd source.asm /link /entry:main
The options used explanation:
It will not be very surprising that the first code example will be the “Hello world!”. This program uses three system functions:
The functions are implemented in a library file kernel32.lib, which is statically linked. We use the “includelib” directive to inform the linker where to search for functions. To inform the assembler about the names of functions, we declare them with the set of “extern” directives. The details of each statement of the program are explained in comments.
option casemap:none ; recognising small and capital letters includelib kernel32.lib ; statically linked library with system functions EXTERN GetStdHandle:PROC ; declaration of system functions for use EXTERN WriteConsoleA:PROC EXTERN ExitProcess:PROC STD_OUTPUT_HANDLE equ -11 ; STD_OUTPUT_HANDLE costant ; In the data section of our program, there is a string to be displayed .data message db "Hello, World!", 13, 10 msgLen equ $ - message ; constant calculation with string length ; In the code section of our program, there are instructions for execution .code main PROC ; main function - entry point sub rsp, 28h ; shadow space + align ; HANDLE hConsole = GetStdHandle(STD_OUTPUT_HANDLE) mov ecx, STD_OUTPUT_HANDLE call GetStdHandle ; this function returns the handle of the console window ; WriteConsoleA(hConsole, message, msgLen, &written, NULL) mov rcx, rax ; console window handle lea rdx, message ; pointer to the buffer mov r8d, msgLen ; length lea r9, written ; pointer to a var with a real number of chars written mov qword ptr [rsp+20h], 0 ; 5th argument (lpReserved = NULL) call WriteConsoleA ; this function displays text in the console ; ExitProcess(0) xor ecx, ecx ; value to be returned call ExitProcess ; return to operating system main ENDP ; end of the main function ; In the uninitialised data section of our program, there is a "written" variable .data? written dq ? ; variable which holds the number of written chars END ; end of source file
To create the static library, the assembler program shouldn't have the main procedure defined. All other procedures will be made available for other programs by default. If there is a need to hide a procedure from visibility, it is possible to mark it as PRIVATE. The first step is to assemble the source file with MASM.
ml64 /c source.asm
The second step is to create the lib file with the lib tool.
lib source.obj
This will create the source.lib file, which can be imported into the program, where we can use all available procedures.
The example for the library will be the program containing the function “print_int”, which displays the integer number provided as an argument via the rcx register.
NASM