Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:multiasm:paarm:chapter_5_15 [2025/12/05 00:17] eriks.klavinsen:multiasm:paarm:chapter_5_15 [2025/12/08 12:33] (current) eriks.klavins
Line 1: Line 1:
 ====== Energy Efficient Coding ====== ====== Energy Efficient Coding ======
  
-Assembler code is assembled into a single object code. Compilers, instead, take high-level language code and convert it to machine codeAnd during compilation, the code may be optimised in several ways. For examplethere are many ways to implement statementsFOR loops or Do-While loops in the assemblerThere are some good hints for optimising the assembler code as well, but these are just hints for the programmer.  +Some special instructions are meant to put the processor into sleep modes and wait for an event to occurThe processor can be woken up by an interrupt or by an event. In these modes, the code may be explicitly created to initialise interrupts and eventsand to handle them. After thatthe processor may be put into sleep mode and remain asleep unless an event or interrupt occurs. The following code example can be used only in bare-metal mode – without an OS. 
-  - Take into account the instruction execution time (or cycle). Some instructions take more than one CPU cycle to execute, and there may be other instructions that achieve the desired result.  +<codeblock code_label> 
-  - Try to use the register as much as possible without storing the temporary data in the memory. +<caption>IDLE loop</caption> 
-  - Eliminate unnecessary compare instructions by doing the appropriate conditional jump instruction based on the flags that are already set from previous arithmetic instructionRemember that arithmetic instructions can update the status flags if the postfix ''<fc #008000>S</fc>'' is used in the instruction mnemonic. +<code
-  - It is essential to align both your code and data to get a good speedup. For ARMv8, the data must be aligned on 16-byte boundaries. In general, if alignment is not used on a 16-byte boundary, the CPU will eventually raise an exception.  +.global idle_loop 
-  - And yet, there are still multiple hints that can help speed up the computation. In small code examples, the speedup will not be noticeable. The processors can execute millions of instructions per second.+idle_loop: 
 +1:  WFI             @ Wait For Interrupt, core goes to low-power 
 +    B   1b          @ After the interruptgo back and sleep again 
 +</code> 
 +</codeblock> 
 +<note>Note that interrupt handling and initialisation must also be implemented in the code; otherwise, the CPU may encounter an error that may force reboot</note> 
 +The example only waits for interrupts to occur. To wait for events and interrupts, the ''<fc #800000>WFI</fc>'' instruction must be replaced with the ''<fc #800000>WFE</fc>'' instruction. Another CPU core may execute an ''<fc #800000>SEV</fc>'' instruction that signals an event to all cores.
  
-Processors today can execute many instructions in parallel using pipelining and multiple functional unitsThese techniques allow the reordering of instructions internally to avoid pipeline stalls (Out-of-Order execution), branch prediction to guess the branching path, and othersWithout speculation, each branch in the code would stall the pipeline until the outcome is knownThese situations are among the factors that reduce the processor's computational power+On a Raspberry Pi 5 running Linux, it is not observable whether the CPU enters these modes, because the OS generates many events between CPU cores and also handles many interrupts from communication interfaces and other Raspberry Pi components. 
 +Another way to save more energy while running the OS on the Raspberry Pi is to reduce the CPU clock frequency. There is a scheme called dynamic voltage and frequency scaling (DVFS), the same technique used in laptopsthat reduces power consumption and thereby increases battery lifeOn the internet, there is a paper named “Cooling a Raspberry Pi Device ”The paper includes one chapter explaining how to reduce the CPU clock frequency. The Linux OS exposes CPU frequency scaling through sysfs, e.g.
 +  * ”/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor” 
 +  * “/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq”
  
-===== Speculative instruction execution =====+It is possible to use syscalls in assembler to open and write specific values into them.  
 +<codeblock code_label> 
 +<caption>Power saving</caption> 
 +<code> 
 +.global _start 
 +.section .text 
 +_start: 
 +    @ openat(AT_FDCWD, path, O_WRONLY, 0) 
 +    mov     x0, #-100               @ AT_FDCWD 
 +    ldr     x1, =gov_path           @ const char *pathname 
 +    mov     x2, #1                  @ O_WRONLY 
 +    mov     x3, #0                  @ mode (unused) 
 +    mov     x8, #56                 @ sys_openat 
 +    svc     #0 
 +    mov     x19, x0                 @ save fd
  
-===== Barriers(instruction synchronization / data memory / data synchronization / one way BARRIER) =====+    @ write(fd, "powersave\n", 10) 
 +    mov     x0, x19 
 +    ldr     x1, =gov_value 
 +    mov     x2, #10                 @ length of "powersave\n" 
 +    mov     x8, #64                 @ sys_write 
 +    svc     #0
  
-===== Conditional instructions =====+    @ close(fd) 
 +    mov     x0, x19 
 +    mov     x8, #57                 @ sys_close 
 +    svc     #0 
 + 
 +    @ exit(0) 
 +    mov     x0, #0 
 +    mov     x8, #93                 @ sys_exit 
 +    svc     #0 
 + 
 +.section .rodata 
 +gov_path: 
 +    .asciz "/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor" 
 +gov_value: 
 +    .asciz "powersave\n" 
 +</code> 
 +</codeblock> 
 +Similar things can be done with CPU frequencies, or even by turning off a separate core. This is just one example template that can be used to put the processor into a specific power mode. By changing the stored path in //gov_path// variable and //gov_value// value. The main idea is to use the OS's system call functions. The OS will do the rest
  
-===== Power saving ===== 
en/multiasm/paarm/chapter_5_15.1764886667.txt.gz · Last modified: 2025/12/05 00:17 by eriks.klavins
CC Attribution-Share Alike 4.0 International
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0