BOOK

 
 Multiasm Project Logo

Project Information

This content was implemented under the following project:

  • Cooperation Partnerships in higher education, 2023, MultiASM: A novel approach for energy-efficient, high performance and compact programming for next-generation EU software engineers: 2023-1-PL01-KA220-HED-000152401.

Consortium Partners

  • Silesian University of Technology, Gliwice, Poland (Coordinator),
  • Riga Technical University, Riga, Latvia,
  • Western Norway University, Forde, Norway,
  • ITT Group, Tallinn, Estonia.

 Consortium Partner's Logos

Erasmus+ Disclaimer
This project has been co-funded by the European Union.
Views and opinions expressed are, however, those of the author or authors only and do not necessarily reflect those of the European Union or the Foundation for the Development of the Education System. Neither the European Union nor the entity providing the grant can be held responsible for them.

Copyright Notice
This content was created by the MultiASM Consortium 2023–2026.
The content is copyrighted and distributed under CC BY-NC Creative Commons Licence and is free for non-commercial use.

CC BY-NC

In case of commercial use, please get in touch with MultiASM Consortium representative.

Introduction

This manual is intended to help students bootstrap into assembler programming across a variety of applications. It presents practical exercises in a hands-on lab format, often also covering toolchain configuration. Some sections present details for hardware, such as remote IoT and remote ARM laboratories. Others assume the student owns or has access to the PC and can install software.

ARM and Mobiles

ARM processors are omnipresent, ranging from simple IoT devices to laptops, notebooks, and workstations.
For this reason, we had to select one technology to use for a practical introduction and experimentation.
To present both hardware interfacing and programming, the obvious choice is the Raspberry Pi. The following chapters present laboratory details and scenarios.

Follow the links below to the lab descriptions and scenarios:

SUT's ARM laboratory

RTU's ARM laboratory

Programming in Assembler for Embedded Systems

Assembler programming for embedded systems uses both on-site programming of devices connected directly to the development platform (usually via USB) and an integrated solution for IoT laboratories: VREL NextGen Software for remote experimentation.

Local development

Local development requires installing the development toolchain. A common scenario is to use Visual Studio Code, a compiler and, usually, a plugin dedicated to a selected platform, e.g. AVR Assembler Toolbox.

Remote development

Remote development uses a ready-made development platform accessible only via a web browser. The device is observable only via a live video stream, which introduces limitations to consider, such as latency and the lack of physical access to the device (e.g., pushing a reset button is impossible).
Users connect to the system using a web browser and develop software in the browser, compile it and inject it into the microcontroller, all remotely.

The following chapters present additional information on using the VREL NextGen remote labs system for assembler programming.

VREL NextGen Management and IoT Developer Software

VREL NextGen software is a web-based, integrated solution for both IoT software developers (Users/Students) and system administrators (Administrators, Super Administrators). It can be used in one of the three aforementioned roles.
There are many public and private instances for internal purposes of the Consortium HE and SME Members. Using the system requires registration with a valid email address. A front-page view is present in figure 2.

Figure 2: VREL NextGen software front page

In the following chapters, there is a manual on how to use the system:

User's Guide

Students book a device (or multiple devices) exclusively. Each device has specific hardware and programming features, which are provided in the documentation.
There is usually a time limit for device bookings, e.g., 2 hours per booking. Students author the code in the web-based editor; depending on the platform, this may also require authoring some configuration files (e.g., platformio.ini, makefile, etc.). Refer to the technical specification for the particular laboratory nodes - it is highly contextual.
Once the code is ready, it can be compiled, and if the compilation is successful, it can be uploaded to the device.
Results can be observed via the web camera in near real time. Some nodes will also provide other interaction capabilities, usually via a bottom-right part of the screen, where documentation is integrated.

Several instances of this software are implemented across consortium partners (details are on IOT-OPEN.EU and IOT-OPEN.EU Reloaded Main Page, but perhaps the one you may want to start from is an instance implemented in SUT, shared with TalTech, ITsilesia, and ITT Group: SUT's VREL NextGen.

How to Start

Student needs to create an account, virtually as in any other web application (figure 3):

Figure 3: VREL NextGen lab, account creation page

Once the account is created, check your mailbox for an activation link. Activate your account and log in to the system.

Devices' Availability

Devices are booked exclusively.

A limited number of devices is available for everyone.

Other devices are provided solely for consortium members.

Don't hesitate to contact the supervisor of your labs if you are a consortium student using this laboratory equipment during your regular course and you cannot see other devices than public: you were not added correctly to the student's group.

Booking a device

The device booking process is straightforward. You can book now, and in the future. The process is described below:

  1. Log in to the system.
  2. Click Bookings button, figure 4.
  3. Select the device from the list. Note that you may have access to multiple laboratories, devices, and technologies. Each device represents specific features, so check carefully with the documentation and scenarios you will implement to determine the necessary hardware components for your lab work. In case of doubt, contact the supervisor. A sample list with ESP32 laboratory devices is present in the figure 5.
  4. Select the date and time; you can move among dates and book a device in advance! Respect other students and refrain from overbooking more than necessary or as instructed. 6
  5. Once booked, switch back to “My Devices” and select a device (figure 7) - you can work only during the booked period, and you will be logged off automatically!
Device booking process step 1 - go to the list of available devices
Figure 4: Devices menu - empty booking list
Device booking process step 2 - choose a device to book
Figure 5: Available devices
Device booking process step 3 - choose booking time
Figure 6: Book the device for a specific period
Figure 7: List of bookings
When the code editing icon is greyed out (<>) as in the figure 7, it means that your booking time has passed or it is in advance.

Introduction to the Arduino Uno programming in Assembler

The following chapter assumes that you are familiar with basic assembler operations for AVR microcontrollers. Below, we explain the most important construction elements and assembler instructions for manipulating the Arduino Uno's (figure 8) GPIOs, based on the ATmega328P microcontroller.

Figure 8: Arduino Uno development board

Template for the assembler code

Using plain assembler (not C++ + assembler) requires a specific construction of the application where the program is located (loaded) into memory exactly at 0x0000.

    .org 0x0000
    rjmp start
 
start:
...

It is common practice to use rjmp (relative jump), which makes is easier to place data before the start of the code. And it is a good “embedded” practice to keep it even, if it does not really jump, as in this example. Forgetting to include it may affect your programming experience later, when you decide to declare data, use interrupts, and so on.

Memory Map

Location of the code (Flash) and data (SRAM) is assigned to the addressing space. It also impacts source code construction in assembler. The following image presents the ATMega328P memory map. When using fully manual memory control, e.g., when the source code does not use .section, it is necessary to explicitly tell the compiler where to place the source code, variables, and other memory-related components. Details are presented in figure 9 and discussed in the following subsection.

Figure 9: Arduinio Uno (ATMega328P) memory map

Source code needs to use explicit declarations to tell the GCC-AVR toolkit how to handle the contents: whether it is code or data, whether the variable is read-only, whether it should be stored across updates, and so on. There are two possible approaches to is: one is to use declarations ('.section'), the other is to manually handle addresses.
There are five .section declarations, as presented in table 1. Each of them does the job of .org <address> in a more elegant way: e.g. .section .data is equivalent to .org 0x800100 - one does not need to remember the addresses.

Table 1: Sections reference in AVR-GCC and ATMega328P
Section Content Location Volatile?
.text Instructions / Code Flash No
.data Initialized globals SRAM (from Flash) Yes
.bss Zeroed/Uninitialized globals SRAM Yes
.rodata Constants / Strings Flash No
.eeprom Long-term storage EEPROM No

We mentioned before that .section .data is equivalent to .org 0x800100. Why .org 0x800100 instead just .org 0x0100?
It is, because Flash, EEPROM, and SRAM all start at 0x0000 (see figure 9, and you need to tell the linker (via source code) which memory block you're referring to. Writing .org 0x0100 may be misleading - the compiler will assume it is located in flash instead of SRAM.
For this reason, the way the AVR-GCC toolchain (assembler and linker) handles Harvard Architecture in Arduino Uno (ATMega328P) is the use of virtual memory offsets: 0x000000 means it is Flash, 0x800100 means it is SRAM (built-in) and 0x810000 means it is EEPROM. Details are presented in table 2.

Table 2: ATMega328P AVR-GCC Virtual Memory Offsets and their real mapping in hardware
Memory Type GCC Internal Offset Hardware Address
Flash 0x000000 0x0000
SRAM 0x800000 0x0000
SRAM (Internal) 0x800100 0x0100
EEPROM 0x810000 0x0000
In some of the following chapters, we sometimes present a “naked” code with the use of .org 0x0100 for simplicity. It is only when the code contains no variables, and everything is stored in Flash.

To summarise briefly, the most common scenario is that the code is intended to land in Flash memory, while variables are in SRAM. Appropriate '.org' instructions ensure the correct placement of the following content. It is possible to write code without using sections, but that makes the code unnecessarily complicated. Whenever you use variable declarations, it is advised to use sections to make the code cleaner and easier to understand. If your code is as simple as setting a GPIO out and one does not use variables (everything is in flash), then you may abandon .section declarations.

The sample code below declares a 16-bit value named 'analogue_value' stored in SRAM (RAM). Note use of .section:

  • .section .data enforces that the following declarations fit in RAM (SRAM),
  • .section .text ensures that instructions following the declaration are located in Flash (non-volatile) memory.
.section .data
.org 0x100              ; Set SRAM start address manually
analogue_value:
    .skip 2             ; 16-bit variable
 
.section .text
.org 0x0000
    rjmp main
 
main:
    ; sample values to store
    ldi r24, 0xFF       
    ldi r25, 0x03       
 
    ; store it to SRAM 
    sts analogue_value, r24
    sts analogue_value + 1, r25
 
loop:
    rjmp loop            ; Dummy loop
If you declare, e.g. a string in the .data section in C++ code, the C++ on boot program loads copies all pre-initialised variables from flash to memory. But here, we use pure assembler, and this process is not triggered; the result is that your string variable existing in the SRAM will contain garbage, not the actual string that you're declaring, even if everything looks OK on the source code level. Because of it, and mostly because of the very limited RAM, keep strings and pre-initialised variables in flash and assume ALL variables declared in .data as uninitialised.

GPIO and Ports

The Arduino Uno exposes a number of GPIOs that can serve as binary inputs and outputs, analogue inputs, and many of them provide advanced, hardware-accelerated functions, such as UART, SPI, I2C, PWM, and ADC. In fact, not all of the pins on the development board are such “general-purpose”: some of them provide specific features, while others do not: there is no internal multiplexer, so functions such as UART, I2C, SPI, PWM and ADC are bound to particular GPIOs and cannot be changed.

On the hardware level, GPIO pins are grouped into 3 “ports” (figure 10), and it is how you can access them:

  • PortB, with GPIOs from D8 to D13,
  • PortC, with GPIOs from port A0 to A5,
  • PortD, with GPIOs from D0 to D7.

A bit in the port corresponds to a single GPIO pin, e.g. bit 5 (6th, zero-ordered) of PortB corresponds to GPIO D13 and is also connected to the built-in LED.

Figure 10: Arduino ports
Some GPIOs have extra features (as presented on figure 10), such as hardware-accelerated PWM, I2C, Serial or SPI. PWM is useful for simulating an analogue output, e.g., to control LED brightness, as we show in the following sections.

IO Registers
Each Port has assigned three 8-bit registers (there are 9 in total then):

  • DDRx (Data Direction Register): there are 3 of those registers, one per Port (B, C, D): DDRB, DDRC and DDRD. This registers configures GPIO as Input (0) or Output (1). Configuration is done “per bit”, so it is equivalent to controlling each GPIO individually.
  • PORTx (Port Data Register): there are also 3 of those registers: PORTB, PORTC and PORTD. The operation depends on the value of the specific bit in the corresponding DDR register; either pin is configured as input or output:
    • If a specific GPIO pin (represented as a bit in the related DDRx register) is set as output, then PORTx bit directly affects the GPIO output: 1 is HIGH (+5V), while 0 is LOW (0V).
    • If a specific GPIO pin is set to input, PORTx value controls the internal pull-up resistor: 1 enables pull-up, 0 disables it.
  • PINx (Pin Value Register) represents the current input state of the GPIO.

Core I/O registers and their IDs
To operate on I/O registers, the developer must either include a library with definitions or (when programming in pure assembler) declare them on their own.
Below there is a table 3 with a list of I/O registers used to control GPIO (Ports B, C and D) and their addresses:

Table 3: I/O registers and their addresses (IDs)
Name Address (I/O) Description
PINB 0x03 Input pins register (Port B)
DDRB 0x04 Data direction register (Port B)
PORTB 0x05 Output register/pull-up enable (Port B)
PINC 0x06 Input pins register (Port C)
DDRC 0x07 Data direction register (Port C)
PORTC 0x08 Output register/pull-up enable (Port C)
PIND 0x09 Input pins register (Port D)
DDRD 0x0A Data direction register (Port D)
PORTD 0x0B Output register/pull-up enable (Port D)

The easiest is to declare constants (converted to values at compile time) and insert them before the code starts (note that they do not exist in memory, so do not disturb code placement and proper execution):

; I/O registers
.equ PINB,  0x03
.equ DDRB,  0x04
.equ PORTB, 0x05
.equ PINC,  0x06
.equ DDRC,  0x07
.equ PORTC, 0x08
.equ PIND,  0x09
.equ DDRD,  0x0A
.equ PORTD, 0x0B
 
; your code starts here
    .org 0x0000
    rjmp start
 
start:
...
.equ is converted into a value and substituted in the code during compile: thus it does not exist in the final, compiled binary code.
Depending on the compiler you use, there are two standards of syntax. You can find the correct .equ PINB, 0x03 or .equ PINB = 0x03

Below are sections representing common usage scenarios for GPIO control.

GPIO Control Assembler Instructions

There is a set of assembler instructions that operate on Ports (I/O registers), as shown in table 4. Those instructions help to control each GPIO pin. They are handy for manually setting GPIO outputs to HIGH (1) or resetting them to LOW (0) each individually, or in a group (a single Port), all at once. They also help to check input values when GPIOs are configured as inputs. Some applications, however, use hardware acceleration beyond manual switching on and off: for example, a PWM signal can be generated using separate hardware-based mechanisms, as described further, which are far more precise than manually enabling and disabling a bit in a loop and do not load the CPU.

Assembler-level operations using ports are much faster than DigitalRead, DigitalWrite, and other instructions in C++, roughly 50 times faster.
Table 4: Common GPIO-related, I/O instructions
Instruction Description
SBI Set bit in register
CBI Clear bit in register
SBIS Skif if bit in register is set (1)
SBIC Skip if bit in register is clear (0)
IN Read hardware register to the general-purpose register (R0-R31)
OUT Write the general-purpose register to the hardware register.
ANDI Masks a bit
ORI Sets a bit

A common scenario for manual control of the GPIO pin is to first set either the GPIO is input or output (using the correct DDRx register), then either set (SBI), reset (CBI), check (SBIS, SBIC), read the whole register (IN) or write the whole register (OUT).

IN and OUT instructions operate on whole, 8-bit registers rather than on single bits. Those are general-purpose instructions that cover the entire range of IO registers (0-63), in addition to the aforementioned DDRx, PORTx, and PINx registers. Operating on multiple bits (8 bits) is faster than setting or reading them individually.

Code Examples

Below are common scenarios implemented in assembler that will help you to understand the code and start programming.

Use GPIO As Digital Output

In this scenario, we use GPIO as a digital output. The simplest is to use the built-in LED to get instantly observable results.
The built-in LED is connected to GPIO13 (D13) and is controlled via PortB (5th bit, zero-based indexing; see figure 10). The built-in LED is enabled in the LOW (0) state and off in the HIGH (1) state on GPIO13. It is also convenient to declare a bit number representing the built-in LED position in PortB, so instead of using a number, we can use an identifier, such as .equ PB5,5.

This code flashes the built-in LED.

.equ DDRB,  0x04
.equ PORTB, 0x05
.equ PB5, 5                 ; PB5 is GPIO 13, and it is a built-in LED
    .org 0x0000
    rjmp RESET

Step 1 - configure GPIO13 (PortB, bit 5) as output, using DDRB register:

RESET:
    ldi r16, 1 << PB5        ; Set bit 5
    out DDRB, r16            ; Set PB5 as output

Execute in a loop on and off, setting directly PortB's bit 5 with sbi and cbi.

LOOP:
    sbi PORTB, PB5           ; Turn LED off
    rcall delay
    cbi PORTB, PB5           ; Turn LED on
    rcall delay
    rjmp LOOP

This implementation of the delay is based on calculating the CPU cycles used to execute the following algorithm:

delay:
    ldi r20, 43     ; Outer loop
outer_loop:
    ldi r18, 250    ; Mid loop
mid_loop:
    ldi r19, 250    ; Inner loop
inner_loop:
    dec r19
    brne inner_loop
    dec r18
    brne mid_loop
    dec r20
    brne outer_loop
    ret

Instructions used in those loops are listed in the table 5, along with a number of cycles used:

Table 5: Selected AVR instruction timings
Instruction Cycles
ldi 1
dec 1
brne 2 (taken), 1 (not taken)
ret 4

Inner loop runs exactly 250 times. Thus, the exact number of cycles used is calculated as:

  • 1×1 (loop init, ldi r19,250),
  • 250×1 (250 executions of dec r19),
  • 249×2 + 1+1 = 499 (249 executions of brne with jump + 1 when not jumping).

Total for this inner loop is then 750 clock cycles of the ATMEGA 328p MCU.

Mid-loop runs also 250 times. Each of 250 mid-loop passes uses:

  • 1×1 (ldi r19,250 for mid loop init)
  • 250×750 (inner loop execution cost, as counted above, because inner loop is nested inside mid-loop)
  • 250×1 (250 executions of dec r18)
  • 249×2 + 1+1 = 499 (249 executions of brne with jump + 1 when not jumping)

Thus, at the level of mid-loop, the total cost of the algorithm consumes: 188250 cycles

The outer loop runs 43 times. It calls mid-loop 43 times, and the exact number of cycles used is:

  • 1×1 (lid r20,43 initialise the outer loop),
  • 43×188250 (call mid-loop 43 times),
  • 43×1 (cost of dec r20 is 1 cycle),
  • 42×2 + 1+1 = 85 (249 executions of brne with jump + 1 when not jumping).

The final cost of the loops is 8094879 cycles.
An extra 4 cycles is for the final ret.

Thus, the total cost of the delay section is 8 094 883 clock cycles.

ATMEGA 328p runs at 16 MHz; thus, each cycle takes 1/16000000 of a second.
Overall, the algorithm's execution time is 8094883/16000000, which is about 0.5s (506ms, to be clear). Not perfect, but good enough for this approach. Still, implementing delays this way is straightforward but also troublesome, and there are better solutions, such as using timers.

This kind of implementation of a delay function works, but it is troublesome. First of all, it is a blocking 'delay'; second, it is energy-inefficient; and, most of all, it is troublesome: you need to analyse your algorithm instruction-by-instruction and calculate the total number of ticks.

Use GPIO as Digital Input - polling

GPIOs may be used as inputs, e.g., to check the state of a button. This method is called active polling; we test the button's state in the loop. A common scenario is that a button shorts to GND, which requires a pull-up resistor (either external or internal). For the internal pull-up, it is necessary to explicitly enable it in assembler code. Arduino Uno is also able to use interrupts for this purpose, and we describe it in the following section.

Configuring GPIO as input with pull-up is pretty simple:

  • configure GPIO as input using the DDRx register (clear the bit that represents a particular GPIO so it becomes an input),
  • enable pull-up: set a bit representing a particular GPIO in the PORTx register.

Reading the value of a GPIO is as simple as reading the corresponding bit in the PINx register: when the GPIO is HIGH, the bit is 1; when the GPIO is LOW, the bit is 0. When the GPIO value controls the algorithm flow, it is more convenient (and faster and more memory-efficient) to use conditional jumps based on the PINx bit value, such as the SBIC instruction.

The example below shows an Arduino Uno with a button connected to GPIO 2, controlling the built-in LED connected to GPIO 13. On button press, the LED turns on; on release, it turns off. Contextual circuit schematic is presented in figure 11.

Figure 11: Circuit for GPIO input handling

Declare ports: output (LED) is on PortB, bits 5 (GPIO 13), and input (button) is on PortD (GPIO 2) as presented in figure 10.

; output (built-in LED on GPIO13)
.equ PINB,   0x03   ; Input Pins Address Port B
.equ DDRB,   0x04   ; Data Direction Register Port B
.equ PORTB,  0x05   ; Data Register Port B
.equ PB5,    5      ; Pin 13 is Port B, Bit 5
; input (button connected to GPIO 2)
.equ PIND,   0x09   ; Input Pins Address Port B
.equ DDRD,   0x0A   ; Data Direction Register Port B
.equ PORTD,  0x0B   ; Data Register Port B
.equ PD2,    2      ; Pin 2 is Port D, Bit 2
 
.section .text
.org 0x0000         
    rjmp main

Configure GPIO 13 as an output, GPIO 2 as an input, and enable the internal pull-up resistor on GPIO 2.

main:
    sbi   DDRB, PB5     ; Set PB5 (GPIO 13) as Output
    cbi   DDRD, PD2     ; Set PB0 (GPIO 2) as Input
    sbi   PORTD, PD2    ; Enable Internal Pull-up on PB0 (GPIO 2)

This section is a simple push-switch implementation. Instead of reading PD2 (GPIO 2), we use the sbic instruction, which tests a bit and branches based on its value: it bypasses the next instruction if PD2=0, thereby executing the section starting with the led_on label. Note that PD2=0 means that the button is pressed, not released.

loop:
 
    sbic  PIND, PD2     ; Skip next instruction if PD2 (GPIO 2) is LOW (Button Pressed)
    rjmp  led_off       ; If High (Not Pressed), go to led_off
 
led_on:
    sbi   PORTB, PB5    ; Set Pin 13 High
    rjmp  loop          ; Jump back to start of loop
 
led_off:
    cbi   PORTB, PB5    ; Set Pin 13 Low
    rjmp  loop          ; Jump back to start of loop

Naturally, polling is simple, but inefficient in terms of resource utilisation. It is far better to implement interrupt-based monitoring of the GPIO input that we present in the following section.

Use GPIO as Digital Input - interrupts

The ATMega328P MCU has two options for handling GPIO changes using interrupts:

  • Dedicated interrupts to GPIO pins 2 (PD2) and 3 (PD3), which are bound to INT0 and INT1, respectively. Here, it is possible to configure interrupts to trigger on a specific edge: rising or falling.
  • General pin change interrupts that are available for most GPIOs - here, those are triggered on any pin change, so logic to distinguish between rising and falling edge is up to the developer. It also requires more complex logic when monitoring multiple inputs.

The GPIO-related interrupt system is controlled with a number of registers, presented in table 6.

Table 6: AtMega328P interrupt system control registers
Register Name Description
EICRA External Interrupt Control Register A Configures the trigger condition (rising edge, falling edge, any logical change, or low level) for the dedicated hardware interrupts INT0 and INT1.
EIMSK External Interrupt Mask Register Used to explicitly enable or disable the INT0 and INT1 interrupts.
EIFR External Interrupt Flag Register Holds the hardware flags that indicate an INT0 or INT1 event has occurred. (Cleared automatically when the ISR runs, or manually by writing a logic '1' to the bit).
PCICR Pin Change Interrupt Control Register Enables pin change interrupts for entire banks/ports (Bank 0 for Port B, Bank 1 for Port C, Bank 2 for Port D).
PCMSK0 Pin Change Mask Register 0 Selects which individual pins on Port B (PCINT0 to PCINT7) are allowed to trigger a Pin Change Interrupt.
PCMSK1 Pin Change Mask Register 1 Selects which individual pins on Port C (PCINT8 to PCINT14) are allowed to trigger a Pin Change Interrupt.
PCMSK2 Pin Change Mask Register 2 Selects which individual pins on Port D (PCINT16 to PCINT23) are allowed to trigger a Pin Change Interrupt.
PCIFR Pin Change Interrupt Flag Register Holds the hardware flags indicating a pin change has occurred on Bank 0, 1, or 2.
SREG Status Register Bit 7 is the Global Interrupt Enable bit (I-bit). This is the master switch for all interrupts, controlled by the sei and cli assembly instructions.

Below is an example code that handles a button press using the INT0 interrupt - we use dedicated interrupts and predefined pins to “save” on complex logic. A press toggles the LED. Note that, for simplicity, this code does not implement any debouncing mechanism: if you test it in a real scenario, it may occur that multiple interrupts are triggered during a single press, because of bouncing. We also use a button connected to GPIO 2. The corresponding schematic is presented in figure 11.

GPIO 2 (button) is controlled with PortD (bit 2). GPIO 13 (LED) is controlled with PortB (bit 5).

.equ PINB,  0x03    ; Port B Input Pins Address
.equ DDRB,  0x04    ; Port B Data Direction Register
.equ PORTD, 0x0B    ; Port D Data Register (used for pull-ups)
.equ EIMSK, 0x1D    ; External Interrupt Mask Register
.equ EICRA, 0x69    ; External Interrupt Control Register A
.equ PD2,   2       ; GPIO 2 (input)
.equ PB5,   5       ; GPIO 13 (output, built-in LED)
; --- Bit Definitions for EICRA ---
.equ ISC00, 0   ; Interrupt Sense Control 0 Bit 0
.equ ISC01, 1   ; Interrupt Sense Control 0 Bit 1

Interrupts on the ATMega328 have fixed assignments in the so-called interrupt vectors table - a relation between a reason (an interrupt) and a result (instruction, usually a jump to the handling function), which is located at the beginning of the code. To know more about interrupts, one should refer to the ATMega328P documentation ATMega328P Datasheet - it represents a list of all interrupts and the location of the addresses of handlers (interrupt vectors) in the memory. We're using an interrupt triggered when a preconfigured change occurs on GPIO2 - it is INT0. When this interrupt is triggered, the ATMega328P will look at the interrupt vectors and execute the instruction at address 0x0004. Typically, there is a jump (rjmp) instruction to the function that handles this interrupt (interrupt handler, or, shortly, ISR; here, its name is int0_isr).

; --- Interrupt Vector Table ---
; Note: avr-as uses byte addresses for .org. 
; The ATmega328P word address for INT0 is 0x0002, which is byte address 0x0004.
.section .text
.org 0x0000
    rjmp reset      ; Reset vector
 
.org 0x0004
    rjmp int0_isr   ; INT0 vector (triggered by D2)
 
.org 0x0034         ; Bypass interrupts vector table

Main program configures GPIOs (GPIO 13 as output, GPIO 2 as input with internal pull-up) and configures INT0 to trigger on the falling edge: when the button is pressed, it shorts to LOW (GND), causing a 1→0 change on press, and the opposite on release. Then it enables INT0 (actually, it does not enable interrupts as a whole) and executes the sti instruction, which enables interrupts. Finally, the loop does nothing; interrupts are handled asynchronously.

; --- Main Program ---
reset:
    ; Configure GPIOs
    sbi DDRB,  PB5  ; LED on PB5 (GPIO 13) as an output
    cbi DDRD,  PD2  ; Button on PB0 (GPIO 2) as Input
    sbi PORTD, PD2  ; Enable internal pull-up for the button
 
    ; Configure INT0 to trigger on a Falling Edge
    ; ISC01 = 1, ISC00 = 0 (Value: 0x02). 
    ; EICRA is in extended memory space, so we must use ldi/sts, not out.
    ldi r16, (1 << ISC01)
    sts EICRA, r16
 
    ; INT0 
    ; INT0 is bit 0 in EIMSK. This register is in standard I/O space.
    sbi EIMSK, 0
 
    ; Enable Global Interrupts
    sei
 
loop:
    ; Do nothing, let the hardware interrupt handle everything
    rjmp loop

The function is called when interrupt INT0 (on the falling edge of GPIO 2) occurs, and it simply toggles PB5 (GPIO 13, built-in LED). This is a trick in AVR that simplifies code: the classical read→swap→write is replaced by a single sbi instruction call, which, in the context of the GPIO registers, toggles the selected bit.

; --- Interrupt Service Routine ---
int0_isr:
    ; Toggle the LED on PB5.
    ; On the ATmega328P, writing a logic 1 to a PINx register toggles the corresponding PORTx bit.
    sbi PINB, PB5
 
    reti

Use Serial Port for Tracing

The Arduino Uno has no direct debugging capabilities, such as step-by-step execution. To monitor program execution, tracing can be used. Here, there is no rich user interface, such as a display, however. One of the tracing methods is sending information via the serial port. It can then be visualised on a developer's computer using any serial port monitor tool.

UART uses two pins:

  • TX (PortD, pin 1) - data from MCU to the external world,
  • RX (PortD, pin 0) - data from the external world to the MCU.

While it is possible to implement a full serial port protocol using GPIOs alone (so-called soft-serial), here we will use a hardware-implemented UART with several registers, as shown in the table 7.

Table 7: Serial port (UART) related registers
Register Address Official Name Common Name Bits Description
UDR0 0xC6 USART I/O Data Register Data register / TX-RX buffer 7:0 Write to transmit data, read to receive data
UCSR0A 0xC0 USART Control and Status Register A Status register RXC0, TXC0, UDRE0, FE0, DOR0, UPE0, U2X0, MPCM0 Status flags (ready, complete, errors, speed mode)
UCSR0B 0xC1 USART Control and Status Register B Control register RXCIE0, TXCIE0, UDRIE0, RXEN0, TXEN0, UCSZ02, RXB80, TXB80 Enable TX/RX, interrupts, 9-bit mode
UCSR0C 0xC2 USART Control and Status Register C Configuration / Frame register UMSEL01:0, UPM01:0, USBS0, UCSZ01:0, UCPOL0 Frame format (mode, parity, stop bits, data size)
UBRR0L 0xC4 USART Baud Rate Register Low Baud rate register (low) 7:0 Lower byte of baud rate divider
UBRR0H 0xC5 USART Baud Rate Register High Baud rate register (high) 3:0 Upper byte of baud rate divider

In the example below, we will use TX only to send data from the MCU to the developer's PC. Let's start with some declarations for registers used during serial transmission and flags:

.equ UBRR0H, 0xC5
.equ UBRR0L, 0xC4
.equ UCSR0A, 0xC0
.equ UCSR0B, 0xC1
.equ UCSR0C, 0xC2
.equ UDR0,   0xC6
 
.equ TXEN0, 3      ; bit 3 controls if UART is enabled or disabled
.equ UDRE0, 5      ; bit 5 indicates the transmit buffer is empty

Then let's define a message “Hello World”. Tailing bytes 13 and 10 are the Windows-standard end-of-line sequence, and the string is 0-terminated.

.org 0x0000
    rjmp reset
message:
    .byte 'H','e','l','l','o',' ','W','o','r','l','d',13,10,0

The following section initialises the serial port for 9600bps:

ldi r16, hi8(103)
sts UBRR0H, r16
ldi r16, lo8(103)
sts UBRR0L, r16

The 103 value is loaded into the UBRR register: the high byte into UBRR0H and the low byte into UBRR0L. The prescaler can be calculated using the formula shown in figure 12.

Figure 12: UART prescaler equation

Where Fcpu is 16MHz for regular Arduino Uno (AtMega 328P). Note that this calculation yields ~9615 bps, not exactly 9600 bps. A tolerance of up to 2% is acceptable (here, it is 0.16%).

Next step is to enable UART:

ldi r16, (1 << TXEN0)
sts UCSR0B, r16

and configure frame format (8 bits, no parity, 1 stop bit, shortly 8N1 - the most common case):

ldi r16, (1 << TXEN0)
sts UCSR0B, r16

Now it is time to send the string to the transmitter, byte by byte. Pointer to the string is loaded to Z register (ZH i ZL respectively) using ldi. The string is processed character by character until it encounters 0 (the end of the string).

main:
    ldi ZH, hi8(message)     ; Load high byte of message address into ZH (Z pointer → flash)
    ldi ZL, lo8(message)     ; Load low byte of message address into ZL
 
send_loop:
    lpm r18, Z+              ; Load next byte from program memory (message) into r18, then increment pointer
    cpi r18, 0               ; Check end of string
    breq main                ; If the end of the string is reached, start sending the whole "Hello World" again

The next character can be loaded to the sending buffer only if the previous one is already been sent. The transmitter is ready for the next byte only when bit UDRE0 in register UCSR0A is set (1). If not, one needs to wait until it is transmitted. The next byte (character, letter) can be written to UDR0 then:

wait_udre:
    lds r19, UCSR0A          ; Load Serial port status register into r19
    sbrs r19, UDRE0          ; Check if buffer is ready to accept next byte
    rjmp wait_udre           ; If not ready, keep waiting
 
    sts UDR0, r18            ; Write character from r18 to UART data register (start transmission)
    rjmp send_loop           ; And process next character

Use of Timers to Generate PWM

Timers are handy for measuring time, waiting for a delay, or executing delayed tasks either once or periodically. The last one is very helpful for generating a PWM signal (a square wave with a controllable duty cycle) and thus controlling the amount of energy delivered to the externally connected device via the GPIO, e.g., to control an LED's brightness. It is somehow equivalent to an analogue output control.

The ATMega328P has 3 timers: one is high-precision (16-bit), and two are low-precision (8-bit). Details are presented in the table 8.
The timer counts “ticks”, where a “tick” can come either directly as a clock cycle (16 MHz) or comes through a prescaler to “slow it down”. Timers 0 and 1 share a common prescaler, and Timer 2 has an independent prescaler with more granularity. See table 8 for a list of valid prescalers for each timer, and table 9 for the frequency and period values for each prescaler. The general formula for timer speed is given by the following equation (figure 13):

Figure 13: Timer speed formula, based on prescaler value, for Arduino Uno working at 16MHz

Additionally, timer 2 has an extra feature: instead of using the internal clock, it can be clocked from an external 32768 kHz crystal oscillator and thus can work as an RTC.

Table 8: ATMega328P timers
Timer Size Channels & Pins (PWM) Valid prescallers Common Uses
Timer 0 8-bit Ch A: Pin 6, Ch B: Pin 5 1, 8, 64, 256, 1024 Used by Arduino for millis() and delay().
Timer 1 16-bit Ch A: Pin 9, Ch B: Pin 10 1, 8, 64, 256, 1024 High precision, long intervals, Servo control.
Timer 2 8-bit Ch A: Pin 11, Ch B: Pin 3 1, 8, 32, 64, 128, 256, 1024 Audio (tone) generation, Real-Time Clocks.
Table 9: Prescalers and related frequentues and periods for typical 16MHz clock
Prescaler Frequency Period (Tick Speed)
1 16 MHz 0.0625 µs
8 2 MHz 0.5 µs
32 (Timer 2 only) 500 kHz 2.0 µs
64 250 kHz 4.0 µs
128 (Timer 2 only) 125 kHz 8.0 µs
256 62.5 kHz 16.0 µs
1024 15.625 kHz 64.0 µs

The frequency is commonly represented as the number of ticks the timer counts per cycle and is referred to as the TOP value.

Each timer in the ATmega328P has 2 channels: A and B. Channels are hardwired to GPIO pins, and you cannot change their assignments. Channels share the same base frequency, but the duty cycle can be controlled separately for each channel.

Common Timer use is to generate a signal to control standard, analogue servomotors. They operate at 50Hz (20ms), and a common configuration for the Arduino Uno is to use Timer 1 (due to its 16-bit resolution, which affects duty-cycle generation accuracy). At 16MHz, a prescaler of 64 is used, so the timer runs at 250000 ticks per second.
To get 50Hz, we calculate TOP as:
TOP = 250000/50 = 5000 ticks.

Each timer has a number of registers, named “The Big Five”. Timer applications go far beyond generating a PWM signal and thus have complex configuration settings, but here we focus only on the PWM application and the use of Timer1. Note, however, that other timers (Timer0 and Timer2) have similar functions, composition and control, differ, e.g. in a number of registers, because in Timer1 you need to use two 8-bit registers (High part and Low part of the value) for each related setting, while in Timer0 and Timer2, you use just one 8-bit register. In the table 10, there is a list of registers, along with their purposes and meanings, and it is further explained below.

Table 10: ATMega328P Timer1 registers (The Big Five)
Register Name Size Full Name Role Meaning / Purpose
TCCR1A / TCCR1B 8-bit (each) Timer/Counter Control Register A & B The Manager Sets Mode, Pin behaviour, and Prescaler.
TCNT1 (H/L) 16-bit Timer/Counter Register 1 The Stopwatch Holds the actual live count (0 to TOP).
OCR1A (H/L) 16-bit Output Compare Register A The Trigger Defines the Duty Cycle (when the pin toggles).
ICR1 (H/L) 16-bit Input Capture Register 1 The Ceiling Defines the Frequency (the TOP value).
TIMSK1 / TIFR1 8-bit (each) Timer Interrupt Mask & Flag Register Notification Handles Interrupts and status flags.

The Manager
Those registers control timer behaviour and functions (refer to table 11:

  • Mode Selection (WGM): Decides if the timer is a simple counter or a PWM generator.
  • Pin Behaviour (COM): Decides if the physical pin (Pin 9/10) should turn ON or OFF when the timer hits a certain number.
  • Prescaler (CS): Sets the “gearbox” speed (1, 8, 64, 256, or 1024).

The Stopwatch
This is for reading; it represents the timer's current “value”. Note that it may change very quickly (and asynchronously with the main code), but it is possible to write to it to enforce a cycle change, e.g., to perform a synchronisation.
As Timer1 is 16-bit, there are two registers, representing the upper (TCNT1H) and lower (TCNT1L) parts of the 16-bit value. Timer0 and Timer2, being 8-bit timers, have only a single Stopwatch register (TCNT0 and TCNT2, respectively).

The Trigger
Those registers store comparator values (values to compare against a Stopwatch). Again, for Timer0 and Timer2, there is one per timer, per channel (so 2 per timer: one for channel A and one for channel B); for Timer1, there are two per channel. E.g. for Timer1, channel A, register names are OCR1AH - the high part of a 16-bit value to compare Stopwatch against and OCR1AL to store the lower part. For channel B, those are OCR1BH and OCR1BL, respectively.

The Ceiling (TOP)
The TOP registers (also referred to as the Input Capture Register or Ceiling) define the maximum “capacity” of the Stopwatch register and thus, define the frequency. The timer simply counts from 0 up to the TOP value, and when it reaches TOP, it resets to 0 on the next tick.

Note that it is zero-based indexing and thus if the desired amount of ticks till overflow is e.g. 5000 ticks, the value of the TOP is 4999, not 5000.

Again, there are two registers for Timer1 (ICR1H, ICR1L - the high and low parts, respectively) and one for each Timer0 and Timer2.

The Notification
These registers are to control the timer-based interrupt notification system. We do not use interrupts for PWM; therefore, this description is omitted.

Table 11: The Manager register details - how to control the Timer1
Register Bit Name Value (Example) Description
TCCR1A 7 COM1A1 1 Compare Output Mode A bit 1: Set for Non-Inverting PWM.
6 COM1A0 0 Compare Output Mode A bit 0: Combined with bit 7 to control Pin 9.
5 COM1B1 0 Compare Output Mode B bit 1: Controls Pin 10 behaviour.
4 COM1B0 0 Compare Output Mode B bit 0: Combined with bit 5 to control Pin 10.
3 - 0 Reserved: Always write to 0.
2 - 0 Reserved: Always write to 0.
1 WGM11 1 Waveform Generation Mode bit 1: Part of Mode 14 selection.
0 WGM10 0 Waveform Generation Mode bit 0: Part of Mode 14 selection.
TCCR1B 7 ICNC1 0 Input Capture Noise Canceler: 1 enables a noise filter (used for sensors).
6 ICES1 0 Input Capture Edge Select: Selects trigger edge for capture (rising/falling).
5 - 0 Reserved: Always write to 0.
4 WGM13 1 Waveform Generation Mode bit 3: Part of Mode 14 selection.
3 WGM12 1 Waveform Generation Mode bit 2: Part of Mode 14 selection.
2 CS12 0 Clock Select bit 2: High bit of the Prescaler (gearbox).
1 CS11 1 Clock Select bit 1: Middle bit of the Prescaler.
0 CS10 1 Clock Select bit 0: Low bit of the Prescaler.

Bits WGM13, WGM12, WGM11 and WGM10 are to be analysed together: they form a 4-bit value representing a mode. Mode 14 is Fast PWM, so binary representation is 1,1,1,0 (WGM13, WGM12, WGM11, WGM10 respectively).
Bits CS define prescaler value as presented in table 12.

Table 12: CS bits and their meaning for prescaler definition for Timer1
CS12 CS11 CS10 Prescaler (Gear) Ticks per second (at 16MHz) Description
0 0 0 No Clock 0 Timer is stopped (Off).
0 0 1 clk/1 16,000,000 No division. 1 tick = 1 CPU cycle.
0 1 0 clk/8 2,000,000 Timer ticks once every 8 CPU cycles.
0 1 1 clk/64 250,000 Our choice for 50Hz.
1 0 0 clk/256 62,500 Used for medium-speed pulses.
1 0 1 clk/1024 15,625 Used for very slow events or long delays.
1 1 0 External T1 N/A Timer ticks on a falling edge of Pin D5.
1 1 1 External T1 N/A Timer ticks on a rising edge of Pin D5.
In the case of writing into the pair of timers' registers that represent the upper and lower parts of a 16-bit value (e.g. ICR1H and ICR1L), it is obligatory to write the higher part first, then the lower part.

To refer to the registers from the assembler code level, it is necessary to use their numbers. It is, however, more convenient to use register literals. A full list of timer-related registers is presented in the table 13.

Table 13: Timer-related registers
Timer Register Address Brief Description
Timer 0 (8-bit) TCCR0A 0x44 Control Reg A: Sets PWM mode and Pin behaviour.
TCCR0B 0x45 Control Reg B: Sets Prescaler (the gearbox).
TCNT0 0x46 Stopwatch: The actual 8-bit live count.
OCR0A 0x47 Trigger A: Duty Cycle for Pin 6.
OCR0B 0x48 Trigger B: Duty Cycle for Pin 5.
TIMSK0 0x6E Interrupt Mask: Enables timer-specific alarms.
TIFR0 0x35 Interrupt Flag: Shows if a timer event occurred.
Timer 1 (16-bit) TCCR1A 0x80 Control Reg A: Mode and Pin behaviour (Ch A & B).
TCCR1B 0x81 Control Reg B: Mode and Prescaler.
TCCR1C 0x82 Control Reg C: Force Output Compare bits.
TCNT1H 0x85 Stopwatch High: Bits 8-15 of the count.
TCNT1L 0x84 Stopwatch Low: Bits 0-7 of the count.
ICR1H 0x87 Ceiling High: Bits 8-15 of the frequency TOP.
ICR1L 0x86 Ceiling Low: Bits 0-7 of the frequency TOP.
OCR1AH 0x89 Trigger A High: Bits 8-15 of Duty Cycle Pin 9.
OCR1AL 0x88 Trigger A Low: Bits 0-7 of Duty Cycle Pin 9.
OCR1BH 0x8B Trigger B High: Bits 8-15 of Duty Cycle Pin 10.
OCR1BL 0x8A Trigger B Low: Bits 0-7 of Duty Cycle Pin 10.
TIMSK1 0x6F Interrupt Mask: Enables Timer 1 alarms.
TIFR1 0x36 Interrupt Flag: Shows Timer 1 status/events.
Timer 2 (8-bit) TCCR2A 0xB0 Control Reg A: Mode and Pin behaviour.
TCCR2B 0xB1 Control Reg B: Prescaler and Mode bits.
TCNT2 0xB2 Stopwatch: The actual 8-bit live count.
OCR2A 0xB3 Trigger A: Duty Cycle for Pin 11.
OCR2B 0xB4 Trigger B: Duty Cycle for Pin 3.
ASSR 0xB6 Asynchronous Status: Used for 32kHz watch crystals.
TIMSK2 0x70 Interrupt Mask: Enables Timer 2 alarms.
TIFR2 0x37 Interrupt Flag: Shows Timer 2 status/events.
System GTCCR 0x43 General Timer Control: Syncs/Resets all timers.

To use timers for PWM generation, one must configure the following (in order):

  • Decide which timer to use.
  • Calculate all timings.
  • Set pin for PWM as output
  • Configure frequency (TOP registers).
  • Configure duty cycle (Trigger registers).
  • Set mode for the timer (Waveform = Fast PWM, Mode 14).

Example for the use of timers

The example below implements a standard servo PWM signal (50Hz) with a 10% duty cycle:

  • Use Timer1: it is 16-bit → required for high precision here.
  • Timings:
    • Prescaler 64 that is 16MHz/64=250000 ticks per second.
    • 50Hz gives 250000/50=5000ticks per single PWM clock cycle, so TOP=5000 ticks (0…4999), where 4999dec=1387hex.
    • Duty cycle 10% of 5000 is 500 (0…499)→then trigger change at 500, where 500dec=01F4hex.
  • Use channel A of Timer1, so output pin is 9.
  • Fast PWM mode is 14 → WGM 14dec=1110bin.

The code contains only a minimal set of register declarations used to control Timer1 for PWM. Note that in the code below, the timer, once configured, generates a PWM signal independently of CPU work. In the final loop, the CPU is doing nothing, just the dummy loop. All logic is controlled solely by a timer, asynchronously and externally to the code. The configuration process is presented in the figure 14.

Figure 14: Timer configuration steps
/* 
 * ATmega328P 50Hz PWM via Timer 1
 * No includes - Manual Address Mapping
 */
 
/* Register Addresses */
.equ DDRB,    0x24      /* Port B Direction Register */
.equ TCCR1A,  0x80      /* Control Register A */
.equ TCCR1B,  0x81      /* Control Register B */
.equ ICR1H,   0x87      /* TOP Value (High) */
.equ ICR1L,   0x86      /* TOP Value (Low) */
.equ OCR1AH,  0x89      /* Duty Cycle (High) */
.equ OCR1AL,  0x88      /* Duty Cycle (Low) */
 
.org 0x0000
    rjmp reset
 
reset:

Configure PIN9 (Timer1, channel A) as output.

    ; Configure PIN 9 as output (Timer1, channel A)
    ldi r16, (1 << 1)
    sts DDRB, r16

Preconfigure the TOP (register) of Timer1 to count from 0 to 4999 (0x1387), so it provides 5000 ticks per 20ms (50Hz) with a prescaler of 64.

    ; Set frequency to 50Hz
    ; Prescaler is 64, ICR1 (TOP) is set to 4999d=0x1387
    ldi r16, 0x13       ; High byte of 4999
    sts ICR1H, r16
    ldi r16, 0x87       ; Low byte of 4999
    sts ICR1L, r16

Preconfigure the trigger (comparator) so it flips the output on GPIO 9 when only the TOP reaches 500 (0x01F4), which is equivalent to 2ms (500 is 10% of 5000). The Timer1 instantly compares the TOP register with this trigger, and when the level of 500 is reached, it switches the output from 1 to 0. The other switch is handled automatically by Timer1 on TOP overflow.

    ; Set triggers (comparators) to 10% of TOP
    ; 500d=0x01F4 to OCR1A
    ldi r16, 0x01       ; High byte of 500
    sts OCR1AH, r16
    ldi r16, 0xF4       ; Low byte of 500
    sts OCR1AL, r16

Configure Timer1 to work in Mode 14 (Fast PWM, cyclical square wave with controllable duty cycle via triggers/comparators).

    ; Set timer to operate as Fast PWM (Mode 14): 
    ; Mode 14 -> WGM = 1110b=14d
    ; COM1A1 = 1 (Clear Pin on Match - Non-Inverting)
    ldi r16, (1 << 7) | (1 << 1)
    sts TCCR1A, r16

Set prescaler to 64 - it automatically starts Timer1

    ; Start timer with prescaler=64
    ; WGM13=1, WGM12=1, CS11=1, CS10=1
    ldi r16, (1 << 4) | (1 << 3) | (1 << 1) | (1 << 0)
    sts TCCR1B, r16

And then do nothing: this loop is a dummy; all work is handled by Timer1. CPU is ready to handle something else.

loop:
    rjmp loop           ; The CPU does nothing! 
                        ; The Timer1 hardware toggles the pin forever.
                        ; It is done asynchronously to the main code!

When connecting an oscilloscope to GPIO pin 9, the result is as presented in figure 15.

PWM signal observed on GPIO 9
Figure 15: PWM signal observed on GPIO 9
Registers TCCR1A and TCCR1B names may be confusing: A and B may suggest that they refer to channel A and channel B. It is NOT the case: both register control Timer1.

Use of Timers to Periodically Call an Action (Interrupts)

Timers are not only used to generate a periodic signal but may also execute code at precisely timed intervals. Those are Interrupts: code run by the timer at predefined intervals.
The two most common cases are:

  • Overflow: timer ticks up to the limit (255 for 8-bit Timer0 and Timer2, 65535 for 16-bit Timer1), then overflows, and starts counting from 0. The interrupt function is called when an overflow occurs. The periods one can achieve in this mode are limited to the preset values (table 8) because the counter's tick rate is 16 MHz divided by the prescaler; thus, the available prescalers determine the available rates.
  • CTC (Clear Timer on Compate Match) provides more precise control: it works on the basis as presented in the case of the application of the Timer to generate a PWM signal presented above: the number of timer ticks is compared to the TOP value: when the threshold is reached, an interrupt is executed. Naturally, the prescaler defines the base rate, but it can be “cut off” at any moment by the TOP value.

In the example provided below, Timer1 (16-bit), operating in CTC mode, will be used to run an interrupt that toggles the built-in LED every 1 second.
Timer1 runs here with a prescaler of 1024, giving 15625 ticks per second. Note, we count from 0 to 15624, and 15624 is represented in hex as 0x3D08 (COMP_VAL_H and COMP_VAL_L).

To use interrupts, one should refer to the ATMega328P documentation ATMega328P Datasheet - it represents a list of all interrupts and the location of the addresses of handlers (interrupt vectors) in the memory. We're using an interrupt triggered when the ticker counter on Timer1 hits a TOP value: interrupt 12, address 0x0016 (in words), “COMPA Timer/Counter1 compare match A”.

Addresses of the interrupts are given in words (16 bits), and thus the location of the interrupt vector that the ATMega328P MCU will execute when the interrupt is triggered is 2*0x0016=0x002C (and thus, the instruction below is .org 0x002C).

We also need to prepare and initialise a stack, growing from the end of the memory, towards lower addresses (refer to figure 9):

; --- Register Address Mapping ---
.equ SREG,    0x3F      ; Status Register
.equ SPH,     0x3E      ; Stack Pointer High
.equ SPL,     0x3D      ; Stack Pointer Low
 
.equ DDRB,    0x04      ; Data Direction Port B
.equ PINB,    0x03      ; Input Pins Port B (Toggle Shortcut)
.equ PB5,     5         ; Built-in LED (Digital 13 / PB5)
 
.equ RAMEND,  0x08FF    ; End of SRAM
; --- Timer1 Register Mapping ---
.equ TCCR1A,  0x80      ; Timer1 Control A
.equ TCCR1B,  0x81      ; Timer1 Control B
.equ OCR1AH,  0x89      ; Output Compare High
.equ OCR1AL,  0x88      ; Output Compare Low
.equ TIMSK1,  0x6F      ; Timer1 Interrupt Mask
.equ TOP_VAL, 0x3D08    ; TOP register value 15624dec

This section is located in flash: there is an array of interrupt vectors that starts at address 0x0000, where program execution begins on reset, and extends to 0x0033 (26 vectors). Here, interrupt 0x0016 is used (in bytes, the vector starts at address 0x002C).

.section .text
.org 0x0000
    rjmp RESET          
.org 0x002C             ; Assembler treats .org as bytes, use 0x002C
    rjmp TIMER_ISR
.org 0x0034             ; Jump past vectors to start logic

Code uses calls, so the stack is obligatory. It starts by the end of the SRAM (0x08FF). It is also necessary to initialise GPIO 13 (LED, PB5) pin as an output prior to controlling it with an ISR.

RESET:
    ; Prepare stack
    ldi r16, hi8(RAMEND)
    out SPH, r16
    ldi r16, lo8(RAMEND)
    out SPL, r16
 
    ; Configure PB5 (built-in LED, GPIO13) as Output
    sbi DDRB, LED_PIN

Timer1 configuration is presented below: CTC operation mode, with the TOP register (OCR1A) set to 0x3D08. Control register TCCR1A has all bits set to low (default), and TCCR1B sets CTC enabling mode and the prescaler equal to 1024 (CS bits set to 101 binary). Then enable interrupts with sti and fall into a blind loop. All actions are executed asynchronously.

    ; Set Timer1 in CTC Mode
    ; Load the compare value for 1 second
    ldi r16, hi8(TOP_VAL)
    sts OCR1AH, r16
    ldi r16, lo8(TOP_VAL)
    sts OCR1AL, r16
 
    ; TCCR1A: Default (0x00)
    ldi r16, 0x00
    sts TCCR1A, r16
 
    ; TCCR1B: 
    ; Bit 3 (WGM12) = 1 (CTC Mode)
    ; Bit 2 (CS12)  = 1 (Prescaler 1024)
    ; Bit 0 (CS10)  = 1 (Prescaler 1024)
    ldi r16, (1 << 3) | (1 << 2) | (1 << 0)
    sts TCCR1B, r16
 
    ; Enable Compare Match A Interrupt
    ldi r16, (1 << 1)   ; OCIE1A bit
    sts TIMSK1, r16
 
    ; Enable Interrupts
    sei
 
LOOP:
    rjmp LOOP          ; It does nothing!

The TIMER_ISR routine is executed by Timer1. The pointer to this code is located in the interrupt vector table, at address 0x002C.

Note, to simply flip the bit, we use an AVR trick: sbi instruction causes the GPIO output to flip, without the need to do a classical READ→COMPLEMENT→WRITE algorithm.
; Interrupt subroutine, called by Timer1
TIMER_ISR:
    sbi PINB, PB5
    reti

Reading analogue values

Reading from the analogue input is not as straightforward as with digital inputs. Built-in ADC converter uses 10-bit resolution, has 6 channels (A0-A5, respectively). It also uses a reference voltage (configurable) as 5V (power source), internal 1.1V source or external reference voltage, connected to Aref input pin.
Inputs are connected to the ADC through the multiplexer, so only one input can be serviced at a time (the ADC has only one channel). Switching inputs may render the first reading invalid due to the measurement method.
The low-level ADC register-based operations use the following formula to obtain an ADC value (figure 16, based on the input value Vgpio and the reference value Vref).

Figure 16: ADC value calculation based on the input voltage and reference voltage

Technically, inside ADC, there is a 34 pF capacitor that loads the input and discharges. For this reason, measuring high impedance can yield inaccurate readings, so the first ADC reading is commonly discarded, and the best practice is to take multiple measurements and calculate an average.

From the assembler developer's point of view, it is more important that the ADC readings is a value between 0 and 1023 (10-bit resolution), and to convert the ADC reading to the measured voltage on ADC input (Vinput), the following formula (figure 17) is valid:

Figure 17: Conversion formula from ADC reading to value represented in Volts
Note that the formula uses Vref: reference voltage. So one needs to know the current configuration is for ADC: whether Vref is power 5V, internal 1.1V, or an external provided with the use of the Aref pin, and pass the appropriate argument to the equation.

Analogue reading uses a complex setup of ADC-related registers as presented in table 14. ADC has a number of registers mapped to a memory area and accessible using the lds and sts instructions.

Table 14: ADC-related registers used for reading the analogue values of GPIOs
Register (Address) Bit Name Description
ADMUX (0x7C) 7 REFS1 Reference Selection Bit 1
6 REFS0 Reference Selection Bit 0 (01 = AVcc)
5 ADLAR Left Adjust Result (1 = Left, 0 = Right)
4 - Reserved
3 MUX3 Analog Channel Selection Bit 3
2 MUX2 Analog Channel Selection Bit 2
1 MUX1 Analog Channel Selection Bit 1
0 MUX0 Analog Channel Selection Bit 0 (0000 = A0)
ADCSRA (0x7A) 7 ADEN ADC Enable (Must be 1)
6 ADSC ADC Start Conversion (Write 1 to start)
5 ADATE ADC Auto Trigger Enable
4 ADIF ADC Interrupt Flag
3 ADIE ADC Interrupt Enable
2 ADPS2 ADC Prescaler Select Bit 2
1 ADPS1 ADC Prescaler Select Bit 1
0 ADPS0 ADC Prescaler Select Bit 0 (111 = by 128)
ADCSRB (0x7B) 7 - Reserved
6 ACME Analog Comparator Multiplexer Enable
5 - Reserved
4 - Reserved
3 - Reserved
2 ADTS2 ADC Auto Trigger Source Bit 2
1 ADTS1 ADC Auto Trigger Source Bit 1
0 ADTS0 ADC Auto Trigger Source Bit 0
ADCH (0x78) 15..8 ADC[9:0] 10-bit Result (ADCL first, then ADCH)
ADCL (0x79) 7..0
DIDR0 (0x7E) 5:0 ADC5D:ADC0D Digital Input Disable (1 = Disable Buffer)

An algorithm for reading an analogue value from a selected input is implemented as follows:

  • configure reference voltage and select channel, using the ADMUX register,
  • set prescaler for sampling frequency and enable ADC,
  • disable GPIO being an input: this disables only the “digital” part of the input; the analogue still works.
Last step is optional but highly recommended: if the input value is around 2.5V (mid between LOW and HIGH) and the input GPIO is still active as digital GPIO (parallel to analogue), the Arduino may start draining the power source heavily and also increase analogue signal noise due to frequent switching between LOW and HIGH.

The sample code below configures the ADC, reads from the A0 input, and stores the value as a 16-bit value in the adc_storage variable.

; --- Register Definitions (ATmega328P) ---
.equ ADCL,   0x78
.equ ADCH,   0x79
.equ ADCSRA, 0x7A
.equ ADCSRB, 0x7B
.equ ADMUX,  0x7C
.equ DIDR0,  0x7E
 
; --- Bit Definitions ---
.equ REFS0,  6      ; Reference selection bit 0
.equ ADEN,   7      ; ADC Enable
.equ ADSC,   6      ; ADC Start Conversion
.equ ADPS2,  2      ; Prescaler bit 2
.equ ADPS1,  1      ; Prescaler bit 1
.equ ADPS0,  0      ; Prescaler bit 0
 
; --- Data Segment ---
.section .data
.org 0x0100
adc_storage: .byte 2    ; Reserve 2 bytes in SRAM for the 10-bit result

Now the setup part: connect A0 to the ADC via a multiplexer, select the reference voltage as the power supply (5V), and set the conversion sampling speed using a prescaler (128, which gives 125kHz). Also, disable A0 as a digital GPIO to save energy and lower noise.

; --- Code Segment ---
.section .text
.global main
 
main:
    ; setup multiplexer to use A0 (0000) 
    ; and AVcc (powering voltage +5V) as reference
    ; It is done with the ADMUX register
    ldi r24, (1 << REFS0)
    sts ADMUX, r24
 
    ; setup prescaler and enable ADC.
    ; prescaler is 128 (16MHz/128 = 125kHz)
    ldi r24, (1 << ADEN) | (1 << ADPS2) | (1 << ADPS1) | (1 << ADPS0)
    sts ADCSRA, r24
 
    ; and disable A0 GPIO as a digital input (analogue still works)
    ; good practice
    ldi r24, 0x01
    sts DIDR0, r24

Start conversion by setting the ADSC bit (6) of ADSCRA to 1. ADC requires some time to read the value and complete the conversion (it is based on a capacitor); thus, when it is ready to read, the ADSC bit is cleared by the ADC hardware. Here, we do not use any interrupts, just dummy pulling.

loop:
    ; start ADC conversion
    lds r24, ADCSRA
    ori r24, (1 << ADSC)
    sts ADCSRA, r24
 
wait_adc:
    ; pull ADSC bit
    ; when conversion is ready, ADC clears this bit
    lds r24, ADCSRA
    sbrc r24, ADSC
    rjmp wait_adc

The converted value is stored in the ADCL and ADCH registers. And it is crucial to keep the reading order: low byte first, then high.

    ; read conversion result
    ; IMPORTANT: Read Low Byte first to lock the values
    lds r18, ADCL       ; r18 = Low Byte
    lds r19, ADCH       ; r19 = High Byte
 
    ; save to memory
    ldi r26, lo8(adc_storage)
    ldi r27, hi8(adc_storage)
    st X+, r18          ; Store low byte
    st X, r19           ; Store high byte
 
    rjmp loop           ; Repeat indefinitely
You always need to read the low byte of the result (ADCL), then the high (ADCH), not the opposite!

Speed vs Quality ADC converts an analogue value to its digital representation using a capacitor. Charging and discharging of the capacitor require time and depend on the impedance of the analogue signal's input source. The general rule says that the faster the conversion, the lower the quality and the higher the error ratio. Conversion speed can be controlled using a prescaler value (bits ADPS2, ADPS1, and ADPS0 of the ADCSRA register). The prescaler divides the clock frequency (16MHz) to slow down the conversion process. Prescaler value and related conversion speed and time is presented in table 15.

Table 15: Prescaler values and related conversion times
ADPS2 ADPS1 ADPS0 Division Factor ADC Clock (16 MHz) Clock Period (1/f)
0 0 0 2 8 MHz 0.125 µs
0 0 1 2 8 MHz 0.125 µs
0 1 0 4 4 MHz 0.25 µs
0 1 1 8 2 MHz 0.5 µs
1 0 0 16 1 MHz 1.0 µs
1 0 1 32 500 kHz 2.0 µs
1 1 0 64 250 kHz 4.0 µs
1 1 1 128 125 kHz 8.0 µs

The technical recommendation is to use up to 250kHz. Faster conversions will bring poor quality.

Function Call Standards

While programming in pure assembler, you can freely define your rules for calling conventions. But if your code is supposed to be modular and eventually used by others (e.g. a call from C++), it is good to follow standards. For AVR GCC, there is an Application Binary Interface (ABI) standard. All registers are 8-bit; to pass and return data larger than a byte (16, 32, or 64 bits), groups of registers must be used. Below there is a comprehensive guide on how to pass arguments.

General-purpose registers (R0 to R31) are divided into sections to prevent data loss and misunderstandings.

The following rules should be applied to keep consistency with the AVR GCC ABI (also refer to figure 18):

  1. From the caller's point of view, the following registers can be modified in the callee's code, so the caller cannot assume they are persistent across the call:
    1. R0,
    2. R1-this one is expected to be always 0 on return, see below,
    3. R18-R27,
    4. R30 and R31.
  2. Registers that are not expected to be modified in the callee's code are:
    1. R2-R17,
    2. R28 and R29 - Frame Pointer register (Y).
  3. Arguments passed from the caller to the callee are stored in registers starting from R25 downwards, down to R8:
    1. 8-bit values are passed only to even registers, e.g., the first byte-size argument goes to R24.
    2. 16-bit values are passed as pairs, and the high byte goes to the higher-numbered register, which must be odd, while the low byte goes to the lower-numbered register, which must be even, e.g., R25: R24.
    3. 32-bit values are passed in 4 consecutive registers, always starting from odd, e.g. long can be passed as R19:R18:R17:R16, where MSB goes to R19 and LSB to R16.
    4. If you run out of registers, then the remaining arguments should be passed by stack.
  4. Returning the values from the function (callee) to the caller should follow similar rules to those of passing arguments to the callee, regarding the use of multiple registers to represent data bigger than 8-bit.
    1. The key difference is that the returning value must always use at least R24:
      1. byte is returned in R24,
      2. 16-bit value is returned in a pair of R25(MSB):R24(LSB),
      3. 32-bit value is returned in R25(MSB):R24:R23:R22(LSB).
    2. Additionally, R1 MUST be zeroed prior to return.
  5. Pointers are treated as any other 16-bit value.
Figure 18: AVR GCC ABI calling standard

Example functions: convert unsigned int to ASCII
The function below fulfils the AVR GCC ABI standards and converts an unsigned int value (16-bit) to an ASCII array of characters. Note that the array is provided externally to the function, so it must be allocated at the caller level.

/*
 * void uint16_to_ascii(uint16_t value, char* str)
 * r25:r24 = value (unsigned)
 * r23:r22 = str (pointer), at least 6/8 bytes!
 */
 
.global uint16_to_ascii
 
uint16_to_ascii:
    movw r30, r22           ; Move destination pointer to Z register (r31:r30)
                            ; It also copies r23 to r31
 
    ldi  r19, 0             ; Digit counter
    push r19                ; Push null-terminator (0) onto stack first
/*   Uncomment this section to have CR LF added.
 *   The minimum buffer size then needs to be 8, not 6 bytes. 
 *   ldi  r19, 10
 *   push r19                ; Push 10 (LF)
 *   ldi  r19, 13
 *   push r19                ; Push 13 (CR)
 */
 
.L_divide_loop:
    ; Divide r25:r24 by 10 using shift-and-subtract
    ldi  r18, 0             ; r18 will hold the remainder (the digit)
    ldi  r20, 16            ; Loop for 16 bits
 
.L_div_bit_loop:
    lsl  r24                ; Shift value left (MSB goes into Carry)
    rol  r25
    rol  r18                ; Rotate Carry into remainder
 
    cpi  r18, 10            ; Compare remainder with 10
    brlo .L_skip_sub
    subi r18, 10            ; Remainder = Remainder - 10
    inc  r24                ; Set the lowest bit of the quotient
.L_skip_sub:
    dec  r20
    brne .L_div_bit_loop
 
    ; r18 now has the digit (0-9)
    subi r18, -'0'          ; Convert to ASCII (add 0x30)
    push r18                ; Save digit on stack to reverse order
    ; Check if the quotient (r25:r24) is zero
    sbiw r24, 0
    brne .L_divide_loop     ; If quotient != 0, get the next digit
 
.L_pop_and_store:
    pop  r18                ; Pull digit (or null terminator) from stack
    st   Z+, r18            ; Store into the array and increment pointer
    tst  r18                ; Check if we just stored the 0 terminator
    brne .L_pop_and_store   ; If not zero, keep popping
 
    ret                     ; Return to caller

Sample code using the function above (note, the function is NOT included here; you have to add it yourself) is presented below:

.section .data
; 1. Declaration of a buffer for a string in RAM
my_string_buffer: .byte 6
 
.section .text
.global main
 
main:
    ; 2. Load constant value (e.g., 12345) into r25:r24
    ldi r24, lo8(12345)
    ldi r25, hi8(12345)
 
    ; 3. Load buffer pointer into r23:r22
    ldi r22, lo8(my_string_buffer)
    ldi r23, hi8(my_string_buffer)
 
    ; Call the function
    call uint16_to_ascii
 
loop:
    rjmp loop

Example function: send string buffer (SRAM) to serial port
The function below sends a null-terminated string to the serial port at 9600 bps, 8N1. It does not send “end of line” (codes 13,10).
It works ONLY with ASCII strings stored in SRAM (defined in .data section). To make it work with strings stored in flash, see the warning note below the code.

/*
 * void uart_send_string(char* str)
 * r25:r24 = str (pointer), to SRAM!
 */
 
; Register Addresses (ATmega328P)
.equ UCSR0A, 0xC0
.equ UCSR0B, 0xC1
.equ UCSR0C, 0xC2
.equ UBRR0L, 0xC4
.equ UBRR0H, 0xC5
.equ UDR0,   0xC6
 
; Bit Positions
.equ TXEN0,  3
.equ UDRE0,  5
.equ UCSZ01, 2
.equ UCSZ00, 1
 
uart_send_string:
    ; Argument 1 (Pointer to buffer) is passed in r25:r24
    ; Explicitly loading parts into the Z register (r31:r30)
    mov  r30, r24           
    mov  r31, r25           
 
    ; --- UART Initialization --- 
    ; 1. Set Baud Rate: 9600 bps @ 16MHz (UBRR = 103)
    ldi  r18, lo8(103)      
    sts  UBRR0L, r18
    ldi  r18, hi8(103)      
    sts  UBRR0H, r18
 
    ; 2. Configure UART for 8N1 (8 data bits, no parity, 1 stop bit)
    ldi  r18, (1<<UCSZ01) | (1<<UCSZ00)
    sts  UCSR0C, r18
 
    ; 3. Enable UART Transmitter
    ldi  r18, (1<<TXEN0)
    sts  UCSR0B, r18
 
.L_send_loop:
    ld   r18, Z+            ; Load byte from string and increment pointer
    tst  r18                ; Check if we hit the 0 (null terminator)
    breq .L_done            ; End of string reached
 
.L_wait_tx:
    ; Check the UART Status Register A
    lds  r19, UCSR0A        
    sbrs r19, UDRE0         ; Skip if UDRE0 is 1 (Buffer is ready)
    rjmp .L_wait_tx         ; Wait until buffer is empty
 
    ; Move character to Data Register to start transmission
    sts  UDR0, r18          
    rjmp .L_send_loop       
 
.L_done:
    ret
The function above initialises UART, and sends a string stored in SRAM!
To use a string (e.g. constant) that is stored in Flash, convert this line: ld r18, Z+lpm r18, Z+.
If you run this function in a loop, give it a delay after each Serial port sending session is finished to let the hardware clean buffers.

Delay (in ms)
This function implements the classical, blocking “delay” function, using timers. It uses Timer0 with a prescaler of 64. Then Timer0 increments at 16MHz/64=250kHz → each Timer0 tick lasts 4us. So 1ms then requires 1000/4=250 ticks. Then it simply repeats, waiting for 1ms, the number of times specified by the 16-bit argument (R25:R24).

/*
 * void delay_ms(uint16_t ms)
 * r25:r24 = delay in ms (16-bit)
 */
 
; Register Addresses (ATmega328P)
.equ TCCR0A, 0x44
.equ TCCR0B, 0x45
.equ TCNT0,  0x46
.equ TIFR0,  0x35
 
; Bit Positions
.equ OCF0A,  1   ; Output Compare Flag 0 A
 
delay_ms:
    ; If the input is 0, return immediately
    sbiw r24, 0
    breq .L_passed
 
    ; --- Timer0 Setup ---
    ; Mode: Normal. Prescaler: 64
    ldi r18, (1<<1) | (1<<0) 
    sts TCCR0B, r18
 
.L_outer_loop:
    ldi r18, 0
    sts TCNT0, r18          ; Reset timer count to 0
 
    ; Clear the Compare Match Flag by writing a 1 to it (AVR quirk)
    ldi r18, (1<<OCF0A)
    out TIFR0, r18
 
.L_wait_1ms:
    lds r18, TCNT0
    cpi r18, 250            ; 250 ticks * 4us = 1000us = 1ms
    brlo .L_wait_1ms        ; Keep polling until 250 reached
 
    sbiw r24, 1             ; Decrement ms counter (r25:r24)
    brne .L_outer_loop      ; If not zero, run another 1ms
 
    ; --- Cleanup ---
    ldi r18, 0
    sts TCCR0B, r18         ; Stop the timer to save power
 
.L_passed:
    ret 

SUT AVR Assembler Laboratory Node Hardware Reference

Introduction

Each laboratory node is equipped with an Arduino Uno R3 development board, based on the ATmega328P MCU. It also has two extension boards:

  • external, analogue and digital communication board,
  • user interface board presented on the image 19.

There are 10 laboratory nodes. They can be used independently, but for collaboration, nodes are interconnected symmetrically, with GPIOs described in the hardware reference section below.

Hardware reference

The table 16 lists all hardware components and details. Note that some elements are accessible, but their use is not supported via the remote lab, e.g., buttons and a buzzer.
The node is depicted in the figure 19 and its interface visual schematic is presented in the figure 20. The schematic presents only components used in scenarios and accessible via the VREL NextGen environment (controllable and observable via video stream), omitting unused components such as buttons, a buzzer, and a potentiometer.

Figure 19: AVR (Arduino Uno) SUT Node
Figure 20: SUT node's visual interface components schematic
Table 16: AVR (Arduino Uno) SUT Node Hardware Details
Component ID Component Hardware Details (controller) Control method GPIOs (as mapped to the Arduno Uno) Remarks
D1 LED (red) direct via GPIO binary (0→on, 1→off) GPIO13
D2 LED (red) direct via GPIO binary (0→on, 1→off) GPIO12
D3 LED (red) direct via GPIO binary (0→on, 1→off) GPIO11
D4 LED (red) direct via GPIO binary (0→on, 1→off) GPIO10 shared with interconnection with another module
LED4 4x 7-segment display(+DP) indirect, via two 74HC575 registers serial load to 2 registers, daisy-chained GPIO8 - serial input of the controller (SER_PORT)
GPIO7 - shift data internally (CLK_PIN), raising edge (write next bit and shift data in serial)
GPIO4 - store data to internal buffer, in 74H575, stores only one digit(LAT_PIN)

Handling of the buffered 4-digit, 7-segment display

To display a digit in the 4x7seg. display, there are two definitions needed: the shape of a digit (or other symbol), and its position (1,2,3,4: a binary mask).

The 7-segment display is a common-anode (you use zero to turn the segment on), and thus 0..9 digit definitions are declared below:

; Common Anode 7-segment masks (Active LOW)
; Segments:     DP,g,f,e,d,c,b,a (Bit 7 -> Bit 0)
; Indices:       0     1     2     3     4     5     6     7     8     9
segment_masks:
    .byte      0xC0, 0xF9, 0xA4, 0xB0, 0x99, 0x92, 0x82, 0xF8, 0x80, 0x90

In a common-anode configuration, the active signal to turn on a segment is LOW (0), and to turn it off, it is HIGH (1). The state of a single digit is represented by an 8-bit mask: 7 segments to build the symbol and a DP (decimal point). For example, a digit 7 is represented by bits corresponding to segments “a”, “b”, and “c” set to 0 (to turn segments “a”, “b”, and “c” on) and the remaining bits set to 1 (to turn them off), so the corresponding binary value looks as follows: 11111000b, hence the hexadecimal value is 0xF8 (as in the code above). The MSB bit represents DP, and the LSB segment “a”. This definition affects how one loads data into the shift register: starting from MSB towards LSB, because of the way the register is built and connected to the segments - refer to the function display_digit below.

Naturally, it is possible to expand those definitions to display other symbols, e.g., hexadecimal digits such as A,b,C,d,E,F.

The way the display works is similar to a typical matrix dot display: instead of having to control 32 independent digital lines to control each LED composing the display independently (8 per digit, 4 digits), we use a digit selector (lines 0,1,2,3) and common symbol lines (lines DP,g,f,e,d,c,b,a).
This way, the display “flashes” because, to display more than one digit, you need to iterate over the lines instantly and set the appropriate symbol definitions. However, the human eye is slow enough not to notice it, and thus we see all 4 digits in parallel, not being displayed one by one that in fact is a real scenario.
The schematic in Figure 19 shows an idea of how to control a single digit over a serial port pin (SER_PIN): you need to inject bit by bit, starting from the least significant bit of the symbol representing a digit, then 8 bits of the digit number - selected by lines 1,2,3,4, so only 0001b, 0010b, 0100b and 1000b combinations are used. A 0→1→0 pulse on the clock (SER_CLK) writes the data to the left registers and shifts the contents right (including passing from the left register to the right one). This way, after 16 cycles (8+8), the left register holds the line that selects the digit (1,2,3,4), and the right register holds the combination representing the symbol at this position.
When binary combinations in both registers (line and symbol) are ready to be represented, a LAT_PORT 0→1→0 pulse rewrites register counters to the internal buffer, and it instantly causes displays to light according to the symbol definition loaded into the right register (only current digit, others are off at this time).

Display single digit: function definition
To handle display, a sample function that displays a digit in a selected position is presented below. Note that it does not check parameters and thus assumes that the digit position is a number between 0 and 3, and that a digit to display is 0..9. Going beyond these limits causes unpredictable behaviour and usually an MCU program crash.

; Pin definitions using direct I/O addresses for ATmega328P
.equ SER_PORT, 0x05  ; PORTB I/O address
.equ SER_PIN,  0
.equ CLK_PORT, 0x0B  ; PORTD I/O address
.equ CLK_PIN,  7
.equ LAT_PORT, 0x0B  ; PORTD I/O address
.equ LAT_PIN,  4
 
.global display_digit
 
; void display_digit(uint8_t pos, uint8_t number);
; r24 = position (0 to 3)
; r22 = number (0 to 9)
display_digit:
    push r16
    push r17
    push r18
    push zl
    push zh
 
    ; 1. Load Segment Mask (for U3) from flash
    ldi zl, lo8(segment_masks)
    ldi zh, hi8(segment_masks)
    add zl, r22               ; Add number index to Z pointer
    adc zh, r1                ; r1 is assumed to be 0 (gcc standard)
    lpm r16, Z                ; r16 now holds segment data
 
    ; 2. Load Digit Select Mask (for U2) from flash
    ldi zl, lo8(digit_masks)
    ldi zh, hi8(digit_masks)
    add zl, r24               ; Add position index to Z pointer
    adc zh, r1
    lpm r17, Z                ; r17 now holds digit select data
 
    ; 3. Shift out Segment Data (r16) -> Ends up in U3
    ldi r18, 8                ; Loop counter for 8 bits
shift_segments:
    lsl r16                   ; Shift MSB into Carry flag
    brcs set_ser_seg          ; If Carry is 1, branch to set SER high
    cbi SER_PORT, SER_PIN     ; Clear SER low
    rjmp clock_seg
set_ser_seg:
    sbi SER_PORT, SER_PIN     ; Set SER high
clock_seg:
    ; Pulse SRCLK
    sbi CLK_PORT, CLK_PIN
    cbi CLK_PORT, CLK_PIN
    dec r18
    brne shift_segments
 
    ; 4. Shift out Digit Select Data (r17) -> Ends up in U2
    ldi r18, 8                ; Loop counter for 8 bits
shift_digits:
    lsl r17                   ; Shift MSB into Carry flag
    brcs set_ser_dig
    cbi SER_PORT, SER_PIN
    rjmp clock_dig
set_ser_dig:
    sbi SER_PORT, SER_PIN
clock_dig:
    ; Pulse SRCLK
    sbi CLK_PORT, CLK_PIN
    cbi CLK_PORT, CLK_PIN
    dec r18
    brne shift_digits
 
    ; 5. Pulse Latch (RCLK) to update the output displays
    sbi LAT_PORT, LAT_PIN
    cbi LAT_PORT, LAT_PIN
 
    pop zh
    pop zl
    pop r18
    pop r17
    pop r16
    ret
 
; ---------------------------------------------------------
; Data stored in Program Memory (Flash)
; ---------------------------------------------------------
.section .progmem.data, "a", @progbits
 
; Common Anode 7-segment masks (Active LOW)
; Segments:     DP,g,f,e,d,c,b,a (Bit 7 -> Bit 0)
; Indices:       0     1     2     3     4     5     6     7     8     9
segment_masks:
    .byte      0xC0, 0xF9, 0xA4, 0xB0, 0x99, 0x92, 0x82, 0xF8, 0x80, 0x90
 
; Digit select masks (Assuming active high on QA-QD for digits 1-4)
digit_masks:
    .byte 0x01, 0x02, 0x04, 0x08
Note: registers in this schema store data for only ONE digit. Iterating over digits and displaying them allows it to represent a full, multi-digit number. To display, e.g., 1023, it is necessary to handle each digit separately: “1”, “0”, “2”, and “3”, and to repeat this process continuously. If you stop, only the last digit will be visible.
Changing the definitions of the symbols stored in segment_masks enables you to easily present characters other than numbers. Think about segment_masks as a font definition that defines how a symbol looks.

Display single digit: how to use it to display a number?
Sample code that uses the function declared above and displays 1975 is presented below. Note, the MCU runs here at full speed, constantly updating the display. While it is not necessary to (a minimum, comfortable LED display refresh rate should be around 10Hz), we do not present such a solution here for the sake of simplicity. It is common to address timers for this job to periodically refresh the screen.

.equ SREG,     0x3F     ; Status Register
.equ SPH,      0x3E     ; Stack Pointer High
.equ SPL,      0x3D     ; Stack Pointer Low
.equ SER_PORT, 0x05     ; PORTB I/O address
.equ PINB,     0x03     ; Input Pins Port B (Toggle Shortcut)
.equ SER_PIN,  0        ; GPIO8
.equ DDRD,     0x0A     ; Data Direction Port D
.equ DDRB,     0x04     ; Data Direction Port B
.equ CLK_PORT, 0x0B     ; PORTD I/O address
.equ CLK_PIN,  7        ; GPIO7
.equ LAT_PORT, 0x0B     ; PORTD I/O address
.equ LAT_PIN,  4        ; GPIO4
 
.equ RAMEND,   0x08FF
.global display_digit
; ---------------------------------------------------------
; Data stored in Program Memory (Flash)
; ---------------------------------------------------------
.section .text
.org 0x0000
    rjmp RESET          
 
 
RESET:
 
    ; Prepare stack
    ldi r16, hi8(RAMEND)
    out SPH, r16
    ldi r16, lo8(RAMEND)
    out SPL, r16
    ; Initialise display control outputs
    sbi DDRB, SER_PIN  ; Set PB0 as output
    sbi DDRD, CLK_PIN  ; Set PD7 as output
    sbi DDRD, LAT_PIN  ; Set PD4 as output
 
    clr r25
    clr r23
    ; --- Main Loop, displays in sequence 1->9->7->5 ---
LOOP:
    ldi r24,0
    ldi r22,1
    call display_digit ; Display 1
    ldi r24,1
    ldi r22,9
    call display_digit ; Display 9
    ldi r24,2
    ldi r22,7
    call display_digit ; Display 7
    ldi r24,3
    ldi r22,5
    call display_digit ; Display 5
    rjmp LOOP
 
; void display_digit(uint8_t pos, uint8_t number);
; r24 = position (0 to 3)
; r22 = number (0 to 9)
 
.... here comes the body of the display_digit function

In the function above, we used fixed (constant) digits to display. A common scenario, however, is when the number is stored in some register or in a memory variable.

Convert number to digits: function definition
To display a number on this kind of display, you need to convert it into an array of bytes, each representing a digit. A function below does the trick.

; void convert_to_digits(uint16_t value, uint8_t* array);
; Inputs:
; r25:r24 = Value to convert (up to 9999)
; r23:r22 = Pointer to SRAM array (4 bytes long)
convert_to_digits:
    ; Save registers we are about to use
    push r26
    push r27
    push r18
    push r19
    push r20
 
    ; Move the SRAM pointer from r23:r22 into the X pointer (r27:r26)
    movw r26, r22
 
    ; ---------------------------------------------------
    ; 1. Thousands Digit (Subtract 1000 = 0x03E8)
    ; ---------------------------------------------------
    clr r18             ; Clear digit counter
    ldi r19, 0x03       ; High byte of 1000
    ldi r20, 0xE8       ; Low byte of 1000
loop_1000:
    cp r24, r20         ; Compare value low byte with 1000 low byte
    cpc r25, r19        ; Compare value high byte with 1000 high byte
    brlo done_1000      ; If value < 1000, branch out
    sub r24, r20        ; Subtract 1000 low byte
    sbc r25, r19        ; Subtract 1000 high byte (with carry)
    inc r18             ; Increment thousands digit
    rjmp loop_1000
done_1000:
    st X+, r18          ; Store thousands digit in array[0] and increment X
 
    ; ---------------------------------------------------
    ; 2. Hundreds Digit (Subtract 100 = 0x0064)
    ; ---------------------------------------------------
    clr r18             ; Reset digit counter
    ldi r19, 0x00       ; High byte of 100
    ldi r20, 0x64       ; Low byte of 100
loop_100:
    cp r24, r20
    cpc r25, r19
    brlo done_100
    sub r24, r20
    sbc r25, r19
    inc r18
    rjmp loop_100
done_100:
    st X+, r18          ; Store hundreds digit in array[1] and increment X
 
    ; ---------------------------------------------------
    ; 3. Tens Digit (Subtract 10 = 0x000A)
    ; ---------------------------------------------------
    clr r18             ; Reset digit counter
    ldi r19, 0x00       ; High byte of 10
    ldi r20, 0x0A       ; Low byte of 10
loop_10:
    cp r24, r20
    cpc r25, r19
    brlo done_10
    sub r24, r20
    sbc r25, r19
    inc r18
    rjmp loop_10
done_10:
    st X+, r18          ; Store tens digit in array[2] and increment X
 
    ; ---------------------------------------------------
    ; 4. Ones Digit (The Remainder)
    ; ---------------------------------------------------
    ; Whatever is left in r24 is the ones digit (0-9)
    st X, r24           ; Store ones digit in array[3] (no need to increment X)
 
    ; Restore registers and return
    pop r20
    pop r19
    pop r18
    pop r27
    pop r26
    ret

Note, this function operates on a buffer located in the memory, which can be declared, e.g. as follows:

.section .bss  ; .bss is for uninitialized variables in SRAM
; Reserve 4 bytes in SRAM to hold the 4 converted digits
display_array:
    .space 4

Communication

Devices (laboratory nodes) are interconnected in pairs, so it is possible to work in groups and implement scenarios involving more than one device:

  • node 1 with node 2,
  • node 3 with node 4,
  • node 5 with node 6,
  • node 7 with node 8,
  • node 9 with node 10.

Interconnections are symmetrical, so that device 1 can send data to device 2 and vice versa (similar to serial communication). Note that analogue inputs are also involved in the interconnection interface. See image 21 for details.

Figure 21: SUT AVR nodes interconnection diagram

The in-series resistors protect the Arduino boards' outputs from excessive current when both pins are configured as outputs with opposite logic states.

The capacitors on the analogue lines filter the PWM signal, providing a stable voltage for the analogue-to-digital converter to measure.

Table 17: AVR (Arduino Uno) SUT Node Interconnections
Arduino Uno pin name AVR pin name Alternate function Comment
D2 PD2 INT0 Interrupt input
D5 PD5 T1 Timer/counter input
D6 PD6 OC0A PWM output to generate analogue voltage
D9 PB1 OC1A Digital output / Timer output
D10 PB2 OC1B Digital output / Timer output
A5 PC5 ADC5 Analogue input

Such a connection makes it possible to implement a variety of scenarios:

  • Connection of OC0A to ADC5 allows you to generate a voltage for measuring on input 5 of the analogue-to-digital converter.
  • Connection of OC1A to INT0 allows you to generate a digital periodic signal that can trigger hardware interrupts.
  • Connection of OC1B to T1 allows you to generate a digital periodic signal, the pulse count of which can be counted using timer T1.
Nodes are interconnected in pairs: 1-2, 3-4, 5-6, 7-8, 9-10. Scenarios for data transmission between MCUs require booking and the use of correct nodes for sending and receiving messages.

Laboratory Scenarios

Below are hands-on lab scenarios intended for use with the VREL NextGen system (access via a browser; no need to install the toolchain or any other software).

— MISSING PAGE —

Programming in Assembler for x64

In this section, we will show some examples of programs written purely in assembler or in connection with other programming languages, including C++ and C#. We assume that the reader is familiar with the coursebook, instructions and directives used to write the assembler programs. We will describe the use of the integrated development environment (Visual Studio) and methods to assemble programs with the command line only.We will also show how to create the static and dynamic library written in assembler for use in assembler or in other compilers.

Introduction to the x64 Assembler programming in MASM - Microsoft Visual Studio Community Edition

In the following chapter, we explain how to write, assemble, link and execute programs written in assembly language for x64 processors. We assume that the reader is familiar with the most important processor instructions and MASM directives.

Creating a project in VS with MASM source. Assembling, debugging, disassembly window, register view, memory view - data section,

[piotr] TO BE DONE

Standalone assembly

It is possible to use command-line MASM tools to assemble, link, and create libraries written in assembly language. You can use any editor to create the assembler source code and translate it into machine code. The tools required are integral elements of the Visual Studio Community installation, installed with the option “Desktop development with C++”. For the default VS installation, you can find them in the following folder (it can change due to different version numbers).

C:\Program Files\Microsoft Visual Studio\18\Community\VS\Tools\MSVC\14.50.35717\bin\Hostx64\x64

To use statically included Windows libraries, you need lib files. The essential library is kernel32.lib, but for other Windows functions, you will also need some additional libraries. All are available in the following folder (it can change due to different version numbers).

C:\Program FIles (x86)\Windows Kits\10\Lib\10.0.26100.0\um\x64

For assembling the source file, the ML64.exe program is used. This program has many options, which you can see executing:

ML64.exe /?

After assembling, ML64 can call the linker automatically. An exemplary MASM execution command to assemble and link the file named source.asm can look like this:

ml64 /Fl /Zi /Zd source.asm /link /entry:main

The options used explanation:

  • /Fl - generate listing file. MASM will output the source.lst file with the report on the assembling process.
  • /Zi - add symbolic debug info. MASM will add to the object file names of symbols defined in the program. It will allow debuggers to name user-defined symbols during debugging.
  • /Zd - add line number debug info. MASM will add to the object file source code line numbers.
  • /link - MASM will call the linker.
  • /entry:main - option for the linker, which informs about the entry point of the program.
If you prefer another name than “main” as the entry point for your console program, you will need to specify the type of the system for the resulting code. For a console application, you need to add /SUBSYSTEM:CONSOLE.

It will not be very surprising that the first code example will be the “Hello world!”. This program uses three system functions:

  • GetStdHandle - returns the handle of the console window, which is the main window of our application.
  • WriteConsole - displays the text in the console.
  • ExitProcess - returns control to the operating system.

The functions are implemented in a library file kernel32.lib, which is statically linked. We use the “includelib” directive to inform the linker where to search for functions. To inform the assembler about the names of functions, we declare them with the set of “extern” directives. The details of each statement of the program are explained in comments.

option casemap:none             ; recognising small and capital letters
 
includelib kernel32.lib         ; statically linked library with system functions
 
EXTERN GetStdHandle:PROC        ; declaration of system functions for use
EXTERN WriteConsoleA:PROC
EXTERN ExitProcess:PROC
 
STD_OUTPUT_HANDLE equ -11       ; STD_OUTPUT_HANDLE costant
 
; In the data section of our program, there is a string to be displayed
.data
    message db "Hello, World!", 13, 10
    msgLen  equ $ - message     ; constant calculation with string length
 
; In the code section of our program, there are instructions for execution
.code
main PROC                       ; main function - entry point
    sub rsp, 28h                ; shadow space + align
 
; HANDLE hConsole = GetStdHandle(STD_OUTPUT_HANDLE)
    mov ecx, STD_OUTPUT_HANDLE
    call GetStdHandle           ; this function returns the handle of the console window
 
; WriteConsoleA(hConsole, message, msgLen, &written, NULL)
    mov rcx, rax                ; console window handle
    lea rdx, message            ; pointer to the buffer
    mov r8d, msgLen             ; length
    lea r9, written             ; pointer to a var with a real number of chars written
    mov qword ptr [rsp+20h], 0  ; 5th argument (lpReserved = NULL)
    call WriteConsoleA          ; this function displays text in the console
 
; ExitProcess(0)
    xor ecx, ecx                ; value to be returned
    call ExitProcess            ; return to operating system
main ENDP                       ; end of the main function
 
; In the uninitialised data section of our program, there is a "written" variable
.data?
    written dq ?                ; variable which holds the number of written chars
 
END                             ; end of source file

Creating static libraries

To create the static library, the assembler program shouldn't have the main procedure defined. All other procedures will be made available for other programs by default. If there is a need to hide a procedure from visibility, it is possible to mark it as PRIVATE. The first step is to assemble the source file with MASM.

ml64 /c source.asm
  • /c - assemble without linking.

The second step is to create the lib file with the lib tool.

lib source.obj

This will create the source.lib file, which can be imported into the program, where we can use all available procedures.

The example for the library will be the program containing the function “print_int”, which displays the integer number provided as an argument via the rcx register.

Introduction to Linux assembly programming

NASM

Scenarios