This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revision | |||
| en:multiasm:papc:chapter_6_9 [2026/02/20 13:08] – [Dynamic memory management considerations] pczekalski | en:multiasm:papc:chapter_6_9 [2026/02/20 13:08] (current) – [Dynamic memory management considerations] pczekalski | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | ====== Compatibility with HLL Compilers (C++, C#) and Operating Systems ====== | ||
| + | The integration of assembler code with applications written in high-level languages brings benefits in particular scenarios, such as implementing complex mathematical algorithms and real-time tasks that require efficient, compact code. No one uses an assembler to implement a graphical user interface (GUI) anymore, as there is no reason to do so. Modern desktop operating systems are designed to provide a rich user experience, supporting languages such as C#, C++, and Python for implementing user interfaces (UIs) through libraries. While those UI generation functions can be executed from the assembler level, there is virtually no reason to do it. A more effective approach is to have the main application is written in a high-level language and execute assembly code as needed to perform backend operations efficiently. | ||
| + | |||
| + | In the case of multi-tier web applications, | ||
| + | |||
| + | It is possible to merge assembler code with high-level languages either as: | ||
| + | * static, where assembler code is compiled as a library object file and merged with the code during linking (figure {{ref> | ||
| + | * dynamic, where the assembler code library is loaded during runtime (figure {{ref> | ||
| + | |||
| + | <figure staticlinking> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| + | <figure dynamiclinking> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| + | Dynamic code loading is considered an advantage because the original application does not contain the assembler binary executable; it is kept in a separate file and loaded on demand, allowing it to be compiled and exchanged independently. On the other hand, it raises several challenges, such as versioning, compatibility, | ||
| + | ===== Programming in Assembler for Windows ===== | ||
| + | Windows OS has historically supported unmanaged code written primarily in C++. This kind of code runs directly on the CPU, but divergence in hardware platforms, such as the introduction of ARM-core-based platforms running Windows, causes incompatibility issues. Since the introduction of the .NET framework, Windows has provided developers with a safer way to execute their code, called " | ||
| + | |||
| + | <figure masmintegration1> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| + | |||
| + | There are significant differences between x86 (32-bit) and x64 (64-bit) code, mostly in the scope of integration methods. As we're at a very low level of programming, | ||
| + | |||
| + | <note tip>Code written in assembler and compiled to machine code is always an unmanaged one!</ | ||
| + | |||
| + | ==== Dynamic memory management considerations ==== | ||
| + | Using dynamic memory management at the assembler level is troublesome: | ||
| + | |||
| + | <note tip> | ||
| + | |||
| + | <figure dynamicmemory> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| + | ==== Pure Assembler Applications for Windows CMD ==== | ||
| + | It is possible to write an application for Windows solely in assembler. While the reason to do it is doubtful, some hints presented below, such as calling system functions, may be helpful. | ||
| + | Calls to the Windows system functions is possible via classical '' | ||
| + | A common approach to development is to start with a stub command-line C++ application and manually convert it to assembler requirements. Visual Studio Community ([[https:// | ||
| + | |||
| + | A template of the typical pure assembler, command-line application for Windows is as follows: | ||
| + | <code asm> | ||
| + | ... | ||
| + | .code | ||
| + | hello_world_asm PROC | ||
| + | push rbp ; save frame pointer | ||
| + | mov rbp, rsp ; fix stack pointer | ||
| + | sub rsp, 8 * (4 + 2) | ||
| + | |||
| + | .... ; here comes your code | ||
| + | |||
| + | |||
| + | mov rsp, rbp | ||
| + | pop rbp | ||
| + | ret | ||
| + | hello_world_asm ENDP | ||
| + | END | ||
| + | |||
| + | </ | ||
| + | |||
| + | The name '' | ||
| + | |||
| + | Calling system functions, such as the system message box, requires understanding the arguments passed to them. As there is no direct assembler help, documentation of the Windows system API for C++ is helpful. | ||
| + | Code below presents the necessary components of the assembler app to call system functions (library includes are configured on the project level): | ||
| + | <code adm> | ||
| + | .data | ||
| + | STD_INPUT_HANDLE = -10 | ||
| + | STD_OUTPUT_HANDLE = -11 | ||
| + | STD_ERROR_HANDLE = -12 | ||
| + | |||
| + | handler dq 0 | ||
| + | hello_msg db "Hello world", | ||
| + | info_msg | ||
| + | ... | ||
| + | includelib | ||
| + | includelib | ||
| + | EXTERN MessageBoxA: | ||
| + | ... | ||
| + | |||
| + | | ||
| + | ; RCX => _In_opt_ HWND hWnd, | ||
| + | ; RDX => _In_opt_ LPCSTR lpText, | ||
| + | ; R8 => _In_opt_ LPCSTR lpCaption, | ||
| + | ; R9 => _In_ UINT uType); | ||
| + | mov rcx, handler | ||
| + | mov rdx, offset hello_msg | ||
| + | mov r8, offset info_msg | ||
| + | mov r9, 0 ; 0 is MB_OK | ||
| + | and rsp, not 8 | ||
| + | call MessageBoxA | ||
| + | ... | ||
| + | </ | ||
| + | The majority of standard library functions accept ASCII strings and must be terminated with a 0 byte (0 is a value), so they do not require passing the string length. | ||
| + | The '' | ||
| + | |||
| + | ==== Merging of the High-Level Languages and Assembler Code ==== | ||
| + | A common scenario is to wrap assembler code with stateless functions and encapsulate it in one or more DLL files. All arguments are passed from the calling code, usually written in C++ | ||
| + | |||
| + | **Programming for applications written in unmanaged code** | ||
| + | |||
| + | In the case of the unmanaged code, integration is straightforward. Assembler code is usually encapsulated in the DLL library (or multiple libraries). | ||
| + | Below is a sample dummy assembler function that returns an integer (no parameters), | ||
| + | |||
| + | Assembler code (source for DLL): | ||
| + | <code cpp AssemblerDll.asm> | ||
| + | .code | ||
| + | |||
| + | MyAsmProc proc | ||
| + | mov RAX, 2026 | ||
| + | ret | ||
| + | MyAsmProc endp | ||
| + | end | ||
| + | </ | ||
| + | |||
| + | Relevant definition file: | ||
| + | <code cpp AssemblerDll.def> | ||
| + | LIBRARY AssemblerDll | ||
| + | EXPORTS MyAsmProc | ||
| + | </ | ||
| + | C++ application, | ||
| + | <code cpp WindowsCmdX64.cpp> | ||
| + | #include < | ||
| + | #include < | ||
| + | |||
| + | typedef int(_stdcall* MyProc)(); | ||
| + | HINSTANCE dllHandle = NULL; | ||
| + | int main() | ||
| + | { | ||
| + | dllHandle = LoadLibrary(TEXT(" | ||
| + | if (!dllHandle) | ||
| + | { | ||
| + | std::cerr << " | ||
| + | return 1; | ||
| + | } | ||
| + | MyProc myAsmProcedure = (MyProc)GetProcAddress(dllHandle, | ||
| + | if (!myAsmProcedure) | ||
| + | { | ||
| + | std::cerr << " | ||
| + | FreeLibrary(dllHandle); | ||
| + | return 2; | ||
| + | } | ||
| + | std::cout << myAsmProcedure(); | ||
| + | FreeLibrary(dllHandle); | ||
| + | return 0; | ||
| + | } | ||
| + | |||
| + | </ | ||
| + | <note tip> | ||
| + | <note tip> | ||
| + | |||
| + | **Programming for applications written in managed code** | ||
| + | |||
| + | In the case of managed code, things get more complex. The .NET framework features automated memory management that releases unused memory (e.g., objects for which there are no more references) and optimises variable locations to improve performance. It is known as a .NET Garbage Collector (GC). GC instantly traces references and, in the event of an object relocation in memory, updates all references accordingly. It also releases memory (objects) that are no longer referenced. This automated mechanism, however, applies only across managed code apps. The problem arises when developers integrate a front-end application written in managed code with assembler libraries written in unmanaged code. All pointers and references passed to the assembler code are not automatically traced by the GC. Using dynamically allocated variables on the .NET side and accessing them from the assembler code is a very common scenario. GC cannot " | ||
| + | * figure {{ref> | ||
| + | * figure {{ref> | ||
| + | * figure {{ref> | ||
| + | |||
| + | <figure csharp1> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| + | |||
| + | <figure csharp2> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| + | |||
| + | <figure csharp3> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| + | |||
| + | Luckily, there is a strict set of rules to follow when integrating managed and unmanaged code, to avoid situations presented in {{ref> | ||
| + | - It is safe to call a function that does not have any arguments and returns no value or returns a simple type (by value, stored in a register). | ||
| + | - It is safe to pass simple type values as arguments (e.g. '' | ||
| + | using System; | ||
| + | using System.Runtime.InteropServices; | ||
| + | |||
| + | namespace ConsoleAsmTestTypes | ||
| + | { | ||
| + | class Program | ||
| + | { | ||
| + | [DllImport(" | ||
| + | private static extern int ProcAsm1(int a, int b); | ||
| + | static void Main(string[] args) | ||
| + | { | ||
| + | int a = 10; | ||
| + | int b = 20; | ||
| + | unsafe | ||
| + | { | ||
| + | int c = ProcAsm1(a, b); | ||
| + | } | ||
| + | } | ||
| + | } | ||
| + | }</ | ||
| + | - To ensure seamless use of complex types referenced by address (pointer) between .NET and assembler code, all variables must be declared using '' | ||
| + | - Code should also be marked as '' | ||
| + | private static unsafe extern int ProcAsm2(int* a, int pos); | ||
| + | ... | ||
| + | int[] n1Array = { 1, 2, 3, 4, 5, 6 }; | ||
| + | ... | ||
| + | unsafe | ||
| + | | ||
| + | | ||
| + | { | ||
| + | c = ProcAsm2(aAddress, | ||
| + | } | ||
| + | | ||
| + | |||
| + | |||
| + | ===== Programming in Assembler for Linux ===== | ||
| + | Principles for composing assembler code and high-level language into a single application on Linux OSes are similar to those on Windows; dynamic loading is more complex. Thus, we consider only static linking of the code. The most common use of C++ is as a high-level application. Still other options are possible, such as Python. | ||
| + | |||
| + | Linux provides more parameters passed via registers in its x64 standard calls (up to 6) than Windows (only up to 4). Refer to the chapter [[en: | ||
| + | |||
| + | A common scenario is to use the [[https:// | ||
| + | |||
| + | The sample project is composed of the '' | ||
| + | |||
| + | The '' | ||
| + | <code ini Makefile> | ||
| + | all: main | ||
| + | |||
| + | main: main.o asmfunc.o | ||
| + | g++ -o main main.o asmfunc.o | ||
| + | |||
| + | main.o: main.cpp | ||
| + | g++ -c -g -F dwarf main.cpp | ||
| + | |||
| + | asmfunc.o: asmfunc.asm | ||
| + | nasm -g -f elf64 -F dwarf asmfunc.asm -l asmfunc.lst | ||
| + | |||
| + | clean: | ||
| + | rm -f ./main || true | ||
| + | rm -f ./main.o || true | ||
| + | rm -f ./asmfunc.o || true | ||
| + | rm -f ./ | ||
| + | </ | ||
| + | |||
| + | <note important> | ||
| + | |||
| + | Assembler code exposes functions to the linker using the '' | ||
| + | |||
| + | <code assembler asmfunc.asm> | ||
| + | section .data | ||
| + | section .bss | ||
| + | section .text | ||
| + | |||
| + | global addInAsm | ||
| + | |||
| + | addInAsm: | ||
| + | nop | ||
| + | mov rax, rsi | ||
| + | add rax, rdi | ||
| + | ret | ||
| + | </ | ||
| + | |||
| + | Finally, the calling side (C++ application) uses the '' | ||
| + | <code cpp main.cpp> | ||
| + | #include < | ||
| + | |||
| + | extern " | ||
| + | |||
| + | long long a=10; | ||
| + | long long b=7; | ||
| + | long long returnValue; | ||
| + | |||
| + | int main() { | ||
| + | std::cout << " | ||
| + | returnValue = addInAsm(a, | ||
| + | std::cout << "Sum of " << a << " and " << b << " is " << returnValue << std::endl; | ||
| + | return 0; | ||
| + | } | ||
| + | </ | ||