Table of Contents

Authors

The list of book contributors is presented below.

Tallinn University of Technology (TalTech)
Silesian University of Technology
Riga Technical University
ITT Group
ProDron
Czech Technical University
Technical Editor
External Contributors
Reviewers

 

Project Information

This content was implemented under the project: SafeAV - Harmonizations of Autonomous Vehicle Safety Validation and Verification for Higher Education.

Project number: 2024-1-EE01-KA220-HED-000245441.

Consortium

Erasmus+ Disclaimer
This project has been funded with support from the European Commission.
This publication reflects the views only of the author, and the Commission cannot be held responsible for any use which may be made of the information contained therein.

Copyright Notice
This content was created by the SafeAV Consortium 2024–2027.
The content is copyrighted and distributed under CC BY-NC Creative Commons Licence and is free for non-commercial use.

CC BY-NC

Introduction

Electronics design trends have revolutionized society. The start was with centralized computing led by firms like IBM and DEC. These technologies enhanced productivity for global business operations, significantly impacting finance, HR, and administrative functions, eliminating the need for extensive paperwork. The next wave in economy shaping technologies consisted of edge computing devices (red in Figure below) such as personal computers, cell phones, and tablets. With this capability, companies such as Apple, Amazon, Facebook, Google, and others could add enormous productivity to the advertising and distribution functions for global business. Suddenly, one could directly reach any customer anywhere in the world. This mega-trend has fundamentally disrupted markets such as education (online), retail (ecommerce), entertainment (streaming), commercial real estate (virtualization), health (telemedicine), and more. The next wave of electronics is the dynamic integration of artificial intelligence with physical assets, and apex of this capability is autonomy.

Autonomy research traces its lineage to mid-20th-century cybernetics and control theory, where researchers like Norbert Wiener, Ross Ashby, and early robotics pioneers explored how machines could sense, process feedback, and act purposefully. The 1960s–1980s brought key breakthroughs: Shakey the Robot at SRI demonstrated integrated perception, planning, and action; DARPA’s Autonomous Land Vehicle project pushed early computer vision and navigation; and advances in probabilistic robotics—such as Kalman filtering, Bayesian estimation, and SLAM—formalized how autonomous systems make decisions under uncertainty. During this period, autonomy was largely rule-based and dominated by deterministic control, limited sensing, and narrow computational capabilities.

Modern autonomy began accelerating in the 1990s and 2000s with increased computing power, the rise of machine learning, and large-scale government programs. The DARPA Grand Challenges (2004–2007) marked a turning point, proving that self-driving vehicles could handle complex, unstructured environments and catalyzing both academic and commercial investment. The 2010s saw deep learning revolutionize perception, enabling robust object detection, scene understanding, and end-to-end control. This expanded autonomy from traditional robotics to autonomous systems in the ground, maritime, airborne, and space contexts.

Given the massive amount of research, several books have been written on autonomy. For example, Introduction to Autonomous Robots provides a comprehensive and accessible foundation for designing autonomous systems, covering the essential building blocks such as robot mechanisms, sensing modalities, actuation, perception, localization, mapping, and planning. It is widely used in university courses because it blends theory with practical algorithms, offering clear explanations of how autonomous robots interpret their environment and make decisions. Distributed Autonomous Robotic Systems, by contrast, focuses on the challenges and architectures of multi-robot and swarm systems, exploring decentralized control, coordination, communication, and robustness in distributed environments. Together, these two books span the spectrum from single-robot autonomy to collaborative, multi-agent systems, giving readers a solid grasp of both foundational robotics and the complexities of distributed autonomy.

In contrast to existing literature, this book focuses on the innovations required for a core design to be integrated into the governing systems in society. This process is especially challenging for autonomous systems because they integrate four broad domains which have traditionally not interacted with each other:

  1. Legal and regulatory structures which implicitly have assumed human actors.
  2. Traditional mechanically focused safety protocols for cyber-physical systems.
  3. Traditional software product development flows.
  4. New artificial intelligence-based algorithms which replace the “driver” for autonomy.

The remainder of this book is organized as follows. Chapter 2 provides a high-level introduction to autonomous systems, including the underlying technologies and their interaction with regulatory, safety, and standards environments. Chapter 3 examines hardware architectures, with particular emphasis on sensors, high-performance computing platforms, and emerging challenges in hardware supply chains. Chapter 4 focuses on software architecture, including real-time execution, safety-critical software development, and the growing importance of stable and secure software supply chains. Chapter 5 explores higher-level autonomy algorithms for perception, mapping, and localization, with a focus on system safety and reliability. Chapter 6 addresses planning, control, and decision-making, examining how autonomous systems translate perception into safe and effective action. Finally, Chapter 7 examines communication between autonomous systems, humans, and infrastructure—including human–machine interfaces (HMI) and vehicle-to-everything (V2X) communication—with an emphasis on integrated system safety and operational robustness.

Autonomous Systems

Autonomous systems use sensors (e.g. cameras, radars, ultrasonic sensors) to collect information about the environment. The collected data are processed, and decisions regarding further action are made on their basis. What exactly is autonomy? The autonomy of a system can be defined as its ability to act according to its own goals, norms, internal states, and knowledge, without external human intervention. This means that autonomous systems are not limited to robots or unmanned vehicles. This definition includes any automatic functions that can reduce the level of workload or support the person driving the vehicle.

Autonomous systems use advanced technologies such as artificial intelligence, machine learning, neural networks, Internet of Things, and others to perform tasks independently. Autonomous systems are today's Industry 4.0 and are used in various areas, from robotics, through transport and logistics, to medicine and education. An example would be an autonomous car that makes decisions on its own based on data from sensors, or an autonomous transport vehicle (AGV, or Automated Guided Vehicles) designed to safely and efficiently transport loads in a warehouse, without the need for operator supervision. Another application of autonomous systems are production systems that, based on data from industrial sensors, automatically control production processes, control machines and optimize production. This allows for shortening production times, reducing production costs and increasing product quality. Autonomous systems are also used in transport and logistics, where they enable faster and more efficient delivery of goods. Thanks to the Internet of Things and monitoring systems, every stage of transport can be tracked, from loading to delivery, which allows for better control of the process. Autonomous systems are becoming an increasingly important part of our lives, and their development and application will have an increasing impact on the future.

Autonomous systems operate in fundamentally different physical environments across ground, marine, airborne, and space domains, and these environmental differences strongly influence system design, sensing, safety, and operational architecture. Ground systems operate in highly structured but unpredictable environments with dense obstacles, human interaction, and high-bandwidth connectivity, requiring real-time perception, fast reaction times, and robust human safety assurance. Marine systems operate in less structured but slower-moving three-dimensional environments with fewer obstacles, limited connectivity, and strong environmental disturbances such as waves, currents, and corrosion, placing greater emphasis on long-duration reliability, navigation robustness, and remote supervision. Airborne systems operate in three-dimensional, safety-critical environments governed by strict airspace control, requiring extremely high reliability, precise navigation, fault tolerance, and formal certification due to the severe consequences of failure. Space systems operate in the most extreme and isolated environment, characterized by radiation exposure, vacuum, extreme temperature variation, and long communication delays, making real-time human intervention impossible and requiring systems to be highly autonomous, fault-tolerant, and capable of operating independently for extended periods. As a result, autonomy architectures, safety requirements, sensing modalities, and verification approaches vary significantly across these domains, even though they share common underlying principles of perception, decision-making, and control.

Overall, autonomy is a transformational technology which will drive economic processes which will transform society. In order to be effective, autonomy must integrate with the critical elements of society, and the rest of this chapter will discuss these in more detail.

Definitions, Classification, and Levels of Autonomy

Intuitively, autonomy of unmanned systems refers to their ability to self-manage, make decisions, and complete tasks with minimal or no human intervention. To collaborate with other systems or humans, autonomy requires a clear system definition. This definition not only communicates function to partners and users, but also sets an expectation function. Expectation functions are central to many technical (validation), governance (licensing), and legal (liability) processes. Each of the physical domains have built somewhat similar “levels” of autonomy which start setting expectation functions.

Levels of Ground Vehicle Autonomy

For ground vehicles, in 2014, the American organization Society of Automotive Engineers (SAE) International adopted a classification of six levels of autonomous driving, which was subsequently modified in 2016. Based on a decision by the National Highway Traffic Safety Administration (NHTSA), this is the officially applicable standardization in the United States, which is also the most popular in studies on autonomous driving technologies in Europe.

Figure 2: Levels of autonomous driving - SAE International classification [1]

To clarify the situation, SAE International has defined 5 levels of automation for autonomous vehicles, which have been adopted as an industry standard (see Figure 2).

Today, these levels have become the shorthand to communicate expectations and the object of regulatory and legal battles.

Levels of Airborne Autonomy

In general, autonomy or autonomous capability is defined in the context of decision-making or self-governance within a system. According to the Aerospace Technology Institute (ATI), autonomous systems can essentially decide independently how to achieve mission objectives, without human intervention [2]. These systems are also capable of learning and adapting to changing operating environment conditions. However, autonomy may depend on the design, functions, and specifics of the mission or system [3]. Autonomy can be broadly viewed as a spectrum of capabilities, from zero autonomy to full autonomy. The Pilot Authorization and Task Control (PACT) model assigns authorization levels, from level 0 (full pilot authority) to level 5 (full system autonomy), also used in the automotive industry for autonomous vehicles (see Figure 3).

Figure 3: Pilot authority and tasks control [2]

Levels of autonomy in drone technology are typically divided into five distinct levels, each representing a gradual increase in the drone's ability to operate independently.

Figure 4: Levels of Drone Autonomy [4]

Another general but useful model describing autonomy levels in unmanned systems is the Autonomy Levels for Unmanned Systems (ALFUS) model [5]. European Union Aviation Safety Agency (EASA), in one of its technical reports, provided some information on autonomy levels and guidelines for human-autonomy interactions. According to EASA, the concept of autonomy, its levels, and human-autonomous system interactions are not established and remain actively discussed in various areas (including aviation), as there is currently no common understanding of these terms [6]. Since these concepts are still somewhat developmental, this becomes a huge challenge for the unmanned aircraft regulatory environment as they remain largely unestablished.

The classification of autonomy levels in multi-drone systems is somewhat different. In multi-drone systems, several drones cooperate to perform a specific task. Designing multi-drone systems requires that individual drones have an increased level of autonomy. The classification of autonomy levels is directly related to the division into flights performed within the pilot's or observer's line of sight (VLOS) and flights performed beyond the pilot's line of sight (BVLOS), where particular attention is paid to flight safety. One way to address the autonomy issue is to classify the autonomy of drones and multi-drone systems into levels related to the hierarchy of tasks performed [7]. These levels will have standard definitions and protocols that will guide technology development and regulatory oversight. For single-drone autonomy models, two distinct levels are proposed: the vehicle control layer (Level 1) and the mission control layer (Level 2), see Figure 5. Multi-drone systems, on the other hand, have three levels: single-vehicle control (Level 1), multi-vehicle control (Level 2), and mission control (Level 3). In this hierarchical structure, Level 3 has the lowest priority and can be overridden by Levels 2 or 1.

Figure 5: Autonomy Levels for Multi-Drone Systems

Marine autonomy (IMO MASS levels) and Space autonomy (NASA ALFUS framework)

For marine systems, the International Maritime Organization (IMO) defines autonomy through its Maritime Autonomous Surface Ship (MASS) framework, which describes four progressive levels of autonomy based on the degree of human involvement and onboard decision-making capability. At lower levels, ships use automation primarily to assist human crews with navigation, propulsion, and safety monitoring, while humans remain onboard and responsible for operational decisions. Intermediate levels allow remote operation, where ships may operate without onboard crew but are supervised and controlled from shore-based control centers. At the highest level, fully autonomous vessels can perceive their environment, make navigation and mission decisions independently, and execute those decisions without human intervention. This framework reflects the operational realities of maritime missions, where long durations, predictable dynamics, and remote monitoring make gradual progression toward autonomy feasible.

In space systems, autonomy is commonly described using NASA’s Autonomy Levels for Unmanned Systems (ALFUS) framework, which evaluates autonomy based on the system’s independence from human control, its ability to handle environmental complexity, and its capacity to accomplish mission objectives without intervention. At lower levels, spacecraft rely heavily on ground operators for command and control, executing predefined instructions with minimal onboard decision-making. As autonomy increases, spacecraft gain the ability to perform functions such as fault detection and recovery, autonomous navigation, and adaptive mission planning. At the highest levels, systems can independently perceive their environment, evaluate mission goals, and dynamically adjust their behavior to achieve objectives without real-time human guidance. This progression is particularly important in deep-space missions, where communication delays make continuous human control impractical.

Why marine and space autonomy frameworks differ from ground autonomy:

Marine and space autonomy frameworks differ fundamentally from ground autonomy because their operational constraints emphasize endurance, remote operation, and system resilience rather than continuous interaction with humans in dense, unpredictable environments. Ground vehicles must operate safely in close proximity to human drivers, pedestrians, and complex infrastructure, requiring highly responsive real-time perception and decision-making. In contrast, marine systems operate in relatively structured environments with fewer immediate hazards, allowing autonomy to focus more on navigation efficiency and remote supervision. Space systems present even greater challenges, including extreme communication latency, harsh environmental conditions, and the impossibility of real-time human intervention, requiring spacecraft to autonomously detect faults, maintain operational health, and ensure mission survival. As a result, autonomy in marine and space systems is driven more by operational independence and mission continuity than by immediate human safety interactions. The table below provides a summary of all four domains.

Unified Level Ground (SAE J3016) Airborne (NASA / UAV / DoD) Marine (IMO MASS / DNV) Space (NASA ALFUS) Description
Level 0 Level 0 – No automation Manual flight AL 0 – Manual ship ALFUS 0 – Manual Human performs all sensing, planning, and control
Level 1 Level 1 – Driver assistance Basic autopilot (e.g., altitude hold, heading hold) MASS 1 – Decision support ALFUS 1 – Teleoperation assist Automation assists human but does not replace decision-making
Level 2 Level 2 – Partial automation Automated flight execution with supervision MASS 2 – Remotely controlled with crew onboard ALFUS 2 – Automated execution System performs control functions but human supervises continuously
Level 3 Level 3 – Conditional automation Supervisory autonomy MASS 3 – Remotely controlled without crew ALFUS 3 – Supervisory autonomy System performs mission tasks but human intervenes when needed
Level 4 Level 4 – High automation High autonomy UAV MASS 4 – Fully autonomous ship ALFUS 4–5 – High autonomy spacecraft System operates independently in defined environments
Level 5 Level 5 – Full automation Fully autonomous UAV Fully autonomous ship (advanced DNV AL 4+) ALFUS 6 – Full autonomy System operates independently in all environments

The classification of autonomy into structured levels is not merely a technical taxonomy; it serves as a foundational construct for legal responsibility, regulatory approval, and ethical governance. These autonomy levels define an expectation function, which specifies who (human or machine) is responsible for sensing, decision-making, and action execution under defined operational conditions. This expectation function becomes the basis for certification, validation, liability assignment, and operational authorization which we will discuss in the next section.

Figure 1

In society, products operate within the confines of a legal governance structure. The legal governance structure is one of the great inventions of civilization and its primary role is to funnel disputes from unstructured expression and perhaps even violence to the domain of courts (figure 1). To be effective, legal governance structures must be perceived as fair and predictable. The objective of fairness is obtained by a number of methods such as due process procedures, transparency and public proceedings, and Neutral decision-makers (judges, juries, arbitrators). The objective of predictability is achieved by the use of the concept of precedence. Precedence is the idea that past rulings are given heavier weight relative to decision making, and it is an extraordinary event to diverge from precedence. Precedence gives the legal system stability. The combination of fairness and predictability shifts the dispensation of disputes to a more orderly process which promotes societal stability.

How does this mechanically work and how does this connect to product development ?

Figure 2

As shown in figure 2, there are three major stages. First, legal frameworks are established by law-making bodies (legislators). However, in practice, legislators cannot specify all aspects and empower administrative entities (regulators) to codify the details of law. Finally, regulators often do not have the technical knowledge to codify all aspects of the law and rely on independent industry groups such as Society for Automotive Engineering (SAE) or Institute of Electrical and Electronics Engineers (IEEE) for technical knowledge. Second, in the field, disputes arise and must be adjudicated by the legal system. The typical process is a trial, under the strict processes established for fairness. The result of the trial is to apply the facts to the legal frameworks and apply a judgement. The facts of the case can result in three potential outcomes. In the first situation, the facts are covered by the legal framework, so there is no further action relative to the governance structure. In the second case, the facts expose an “edge” condition in the governance structure. In this situation, the court looks for previous cases which might fit (the concept of precedence) and uses that to make its judgement. If such a case does not exist, the court can establish precedence with its judgement in this case. This has the effect of weighing the future decisions as well. Finally, in rare situations, the facts of the case are in a field which is so new that there is not much in the way of body of law. In these situation, the courts may make a judgement, but often there is a call for law-making bodies to establish deeper legal frameworks.

In fact, autonomous vehicles (AVs) are considered to be one of these situations. Why ? In traditional automobiles, the body of law connected to product liability is connected to the car, and the liability of actions using the car is connected to the driver. Further, Product liability is often managed at the federal level and driver licensing more locally. However, surprisingly, as the figure below shows, there is a body of law dealing with autonomous vehicles from the distant past. In the days of horses, there were accidents, and a sophisticated liability structure emerged. In this structure, there was a concept that if a person directed his horse into an accident, then the driver was at fault. However, if a bystander did something to “spook” the horse, it was the bystander's fault. Finally, there was also the concept of “no-fault” when a horse unexpectedly went rogue. A discerning reader may well understand that this body of law emerges from a deep understanding of the characteristics of a horse. In legal terms, it creates an “expectation.' What are the “expectations” for a modern autonomous vehicle ? This is currently a highly debated point in the industry.

Overall, whatever value products provide to their consumers is weighed against the potential harm caused by the product, and leads to the concept of legal product liability. While laws diverge across various geographies, the fundamental tenets have key elements of expectation and harm. Expectation as judged by “reasonable behavior given a totality of the facts” attaches liability. As an example, the clear expectation is that if you stand in front of a train, it cannot stop instantly while this is not the expectation for most autonomous driving situations. Harm is another key concept where AI recommendation systems for movies are not held to the same standards as autonomous vehicles. The governance framework for liability is mechanically developed through legislative actions and associated regulations. The framework is tested in the court system under the particular circumstances or facts of the case. To provide stability to the system, the database of cases and decisions are viewed as a whole under the concept of precedence. Clarification on legal points is set by the appellate legal system where arguments on the application of the law are decided what sets precedence.

What is an example of this whole situation ? Consider the airborne space with the figure above where the governance framework consists of enacted law (in this case US) with associated cases providing legal precedence, regulations, and industry standards. Any product in the airborne sector, must be compliant to release their solution to the marketplace.

Ref:

  1. Razdan, R., (2019) “Unsettled Technology Areas in Autonomous Vehicle Test and Validation,” Jun. 12, 2019, EPR2019001.
  2. Razdan, R., (2019) “Unsettled Topics Concerning Automated Driving Systems and the Transportation Ecosystem,” Nov 5, 2019, EPR2019005.
  3. Ross, K. Product Liability Law and its effect on product safety. In Compliance Magazine 2023, [Online]. Available: https://incompliancemag.com/product-liability-law-and-its-effect-on-product-safety/

Introduction to Validation and Verification in Autonomy

As discussed in the governance module, whatever value products provide to their consumers is weighed against the potential harm caused by the product, and leads to the concept of legal product liability. From a product development perspective, the combination of laws, regulations, legal precedence form the overriding governance framework around which the system specification must be constructed [3]. The process of validation ensures that a product design meets the user's needs and requirements, and verification ensures that the product is built correctly according to design specifications.

Fig. 1. V&V and Governance Framework. The Master V&V(MaVV) process needs to demonstrate that the product has been reasonably tested given the reasonable expectation of causing harm. It does so using three important concepts [4]:

  1. Operational Design Domain (ODD): This defines the environmental conditions and operational model under which the product is designed to work.
  2. Coverage: This defines the completeness over the ODD to which the product has been validated.
  3. Field Response: When failures do occur, the procedures used to correct product design shortcomings to prevent future harm.

As figure 1 shows, the Verification & Validation (V&V) process is the key input into the governance structure which attaches liability, and per the governance structure, each of the elements must show “reasonable due diligence.” An example of unreasonable ODD would be for an autonomous vehicle to give up control a millisecond before an accident.

Fig. 2. Execution is space.

Mechanically, MaVV is implemented with a Minor V&V (MiVV) process consisting of:

  1. Test Generation: From the allowed ODD, test scenarios are generated.
  2. Execution: This test is “executed” on the product under development. Mathematically, a functional transformation which produces results.
  3. Criteria for Correctness: The results of the execution are evaluated for success or failure with a crisp criteria-for-correctness.

In practice, each of these steps can have quite a bit of complexity and associated cost. Since the ODD can be a very wide state space, intelligently and efficiently generating the stimulus is critical. Typically, in the beginning, stimulus generation is done manually, but this quickly fails the efficiency test in terms of scaling. In virtual execution environments, pseudo-random directed methods are used to accelerate this process. In limited situations, symbolic or formal methods can be used to mathematically carry large state spaces through the whole design execution phase. Symbolic methods have the advantage of completeness but face algorithmic computational explosion issues as many of the operations are NP-Complete algorithms.

The execution stage can be done physically (such as test track above), but this process is expensive, slow, has limited controllability and observability, and in safety critical situations, potentially dangerous. In contrast, virtual methods have the advantage of cost, speed, ultimate controllability and observability, and no safety issues. The virtual methods also have the great advantage of performing the V&V task well before the physical product is constructed. This leads to the classic V chart shown in figure 1. However, since virtual methods are a model of reality, they introduce inaccuracy into the testing domain while physical methods are accurate by definition. Finally, one can intermix virtual and physical methods with concepts such as Software-in-loop or Hardware-in-loop. The observable results of the stimulus generation are captured to determine correctness. Correctness is typically defined by either a golden model or an anti-model. The golden model, typically virtual, offers an independently verified model whose results can be compared to the product under test. Even in this situation, there is typically a divergence between the abstraction level of the golden model and the product which must be managed. Golden model methods are often used in computer architectures (ex ARM, RISCV). The anti-model situation consists of error states which the product cannot enter, and thus the correct behavior is the state space outside of the error states. An example might be in the autonomous vehicle space where an error state might be an accident or violation of any number of other constraints. The MaVV consists of building a database of the various explorations of the ODD state space, and from that building an argument for completeness. The argument typically takes the nature of a probabilistic analysis. After the product is in the field, field returns are diagnosed, and one must always ask the question: Why did not my original process catch this issue? Once found, the test methodology is updated to prevent issues with fixes going forward. The V&V process is critical in building a product which meets customer expectations and documents the need for “reasonable” due diligence for the purposes of product liability in the governance framework.

In most cases, the generic V&V process must grapple with massive ODD spaces, limited execution capacity, and high cost of evaluation. Further, all of this must be done in a timely manner to make the product available to the marketplace. Traditionally, the V&V regimes have been bifurcated into two broad categories: Physics- Based and Decision-Based. We will discuss the key characteristics of each now.

Physics-Based Operating Domains

For MaVV, the critical factors are the efficiency of the MiVV “engine” and the argument for the completeness of the validation. Historically, mechanical/non-digital products (such as cars or airplanes) required sophisticated V&V. These systems were examples of a broader class of products which had a Physics-Based Execution (PBE) paradigm. In this paradigm, the underlying model execution (including real life) has the characteristics of continuity and monotonicity because the model operates in the world of physics. This key insight has enormous implications for V&V because it greatly constrains the potential state-space to be explored. Examples of this reduction of state-space include:

- Scenario Generation: One needs only worry about the state space constrained by the laws of physics. Thus, objects which obey physics cannot exist. Every actor is explicitly constrained by the laws of physics.

  1. Monotonicity: In many interesting dimensions, there are strong properties of monotonicity. As an example, if one is considering stopping distance for braking, there is a critical speed above which there will be an accident.

Critically, all the speed bins below this critical speed are safe and do not have to be explored. Mechanically, in traditional PBE fields, the philosophy of safety regulation (ISO 26262 [5], AS9100 [6], etc.) builds the safety framework as a process, where

  1. failure mechanisms are identified;
  2. a test and safety argument is built to address the failure mechanism;
  3. there is a safety process by a regulator (or documentation for self-regulation) which evaluates these two and acts as a judge to approve/decline.

Traditionally, faults considered are primarily mechanical failure. As an example, the flow for validating the braking system in an automobile through ISO 26262 would have the following steps:

  1. Define Safety Goals and Requirements (Concept Phase): Hazard Analysis and Risk Assessment (HARA): Identify potential hazards related to the braking system (e.g., failure to stop the vehicle, unintended braking). Assess risk levels using parameters like severity, exposure, and controllability. Define Automotive Safety Integrity Levels (ASIL) for each hazard (ranging from ASIL A to ASIL D, where D is the most stringent). Define safety goals to mitigate hazards (e.g., ensure sufficient braking under all conditions).
  2. Develop Functional Safety Concept: Translate safety goals into high-level safety requirements for the braking system. Ensure redundancy, diagnostics, and fail-safe mechanisms are incorporated (e.g., dual-circuit braking or electronic monitoring).
  3. System Design and Technical Safety Concept: Break down functional safety requirements into technical requirements, design the braking system with safety mechanisms like hardware (e.g., sensors, actuators) and software (e.g., anti-lock braking algorithms). Implement failure detection and mitigation strategies (e.g., failover to mechanical or electronic control paths).
  4. Hardware and Software Development: Hardware Safety Analysis (HSA): Validate that components meet safety standards (e.g., reliable braking sensors). Software Development and Verification: Use ISO 26262-compliant processes for coding, verification, and validation. Test braking algorithms under various conditions.
  5. Integration and Testing: Perform verification of individual components and subsystems to ensure they meet technical requirements. Conduct integration testing of the complete braking system, focusing on functional tests (e.g., stopping distance), safety tests (e.g., behavior under fault conditions), and stress/environmental tests (e.g., heat, vibration).
  6. Validation (Vehicle Level): Validate the braking system against safety goals defined in the concept phase. Perform real-world driving scenarios, edge cases, and fault injection tests to confirm safe operation. Verify compliance with ASIL-specific requirements.
  7. Production, Operation, and Maintenance: Ensure production aligns with validated designs. Implement operational safety measures (e.g., periodic diagnostics, maintenance), monitor and address safety issues during the product lifecycle (e.g., software updates).
  8. Confirmation and Audit: Use independent confirmation measures (e.g., safety audits, assessment reviews) to ensure the braking system complies with ISO 26262.

Finally, the regulations have a strong idea of safety levels with Automotive Safety Integrity Levels (ASIL). Airborne systems follow a similar trajectory (pun intended) with the concept of Design Assurance Levels (DALs). A key part of the V&V task is to meet the standards required at each ASIL level. Historically, a sophisticated set of V&V techniques has been developed to verify traditional automotive systems. These techniques included well-structured physical tests, often validated by regulators, or sanctioned independent companies (ex TUV-Sud [7]). Over the years, the use of virtual physics-based models has increased to model design tasks such as body design [8] or tire performance [9]. The general structure of these models is to build a simulation which is predictive of the underlying physics to enable broader ODD exploration. This creates a very important characterization, model generation, predictive execution, and correction flow. Finally, because the execution is highly constrained by physics, virtual simulators can have limited performance and often require extensive hardware support for simulation acceleration. In summary, the key underpinnings of the PBE paradigm from a V&V point of view are:

  1. Constrained and well-behaved space for scenario test generation.
  2. Expensive physics-based simulations.
  3. Regulations focused on mechanical failure.
  4. In safety situations, regulations focused on a process to demonstrate safety with a key idea of design assurance levels.

Traditional Decision-based Execution

As cyber-physical systems evolved, information technology (IT) rapidly transformed the world.

Fig. 4. Progression of System Specification (HW, SW, AI).

As shown in Figure 4, within electronics, there has been a progression of system function construction where the first stage was hardware or pseudo-hardware (FPGA, microcode). The next stage involved the invention of a processor architecture upon which software could imprint system function. Software was a design artifact written by humans in standard languages (C, Python, etc.). The revolutionary aspect of the processor abstraction allowed a shift in function without the need to shift physical assets. However, one needed legions of programmers to build the software. Today, the big breakthrough with Artificial Intelligence (AI) is the ability to build software with the combination of underlying models, data, and metrics.

In their basic form, IT systems were not safety critical, and the similar levels of legal liability have not attached to IT products. However, the size and growth of IT is such that problems in large volume consumer products can have catastrophic economic consequences [10]. Thus, the V&V function was very important. IT systems follow the same generic processes for V&V as outlined above, but with two significant differences around the execution paradigm and source of errors. First, unlike the PBE paradigm, the execution paradigm of IT follows a Decision Based Execution mode (DBE). That is, there are no natural constraints on the functional behavior of the underlying model, and no inherent properties of monotonicity. Thus, the whole massive ODD space must be explored which makes the job of generating tests and demonstrating coverage extremely difficult. To counter this difficulty, a series of processes have been developed to build a more robust V&V structure. These include: 1) Code Coverage: Here, the structural specification of the virtual model is used as a constraint to help drive the test generation process. This is done with software or hardware (RTL code). 2) Structured Testing: A process of component, subsection, and integration testing has been developed to minimize propagation of errors. 3) Design Reviews: Structured design reviews with specs and core are considered best practice.

A good example of this process flow is the CMU Capability Maturity Model Integration (CMMI) [11] which defines a set of processes to deliver quality software. Large parts of the CMMI architecture can be used for AI when AI is replacing existing SW components. Finally, testing in the DBE domain decomposes into the following philosophical categories: “Known knowns:” Bugs or issues that are identified and understood, “Known unknowns” Potential risks or issues that are anticipated but whose exact nature or cause is unclear, and “Unknown unknowns” Completely unanticipated issues that emerge without warning, often highlighting gaps in design, understanding, or testing. The last category being the most problematic and most significant for DBE V&V. Pseudo-random test generation has been a key technique used as a method to expose this category [12]. In summary, the key underpinnings of the DBE paradigm from a V&V point of view are: 1) Unconstrained and not well-behaved execution space for scenario test generation, 2) Generally, less expensive simulation execution (no physical laws to simulate), 3) V&V focused on logical errors not mechanical failure 4) Generally, no defined regulatory process for safety critical applications. Most software is “best efforts,” 5) “Unknown-unknowns” a key focus of validation.

A key implication of the DBE space is that the idea from the PBE world of building a list of faults and building a safety argument for them is antithetical to the focus of DBE validation.

Finally, the product development process is typically focused on defining an ODD and validating against that situation. However, in modern times, an additional concern is that of adversarial attacks (cybersecurity). In this situation, an adversary wants to high jack the system for nefarious intent. In this situation, the product owner must not only validate against the ODD, but also detect when the system is operating outside the ODD. After detection, the best case scenario is to safely redirect the system to the ODD space. The risk associated with cybersecurity issues typically split at three levels for cyber-physical systems:

  1. OTA Security: If an adversary can take manipulate the Over the Air (OTA) software updates, they can take over mass number of devices quickly. An example worst case situation would be a Tesla OTA which turns Tesla's into collision engines.
  2. Remote Control Security: If the adversary can take over a car remotely, they can cause harm to the occupants as well as third-parties.
  3. Sensor Spoofing: In this situation, the adversary uses local physical assets to fool the sensors of the target. GPS jamming or spoofing are active examples.

In terms of governance, some reasonable due-diligence is expected to be provided by the product developer in order to minimize these issues. The level of validation required is dynamic in nature and connected to the norm in the industry.

Validation Requirements across Domains

In terms of domains, the Operational Design Domain (ODD) is the driving factor, and typically have two dimensions. The first is the operational model and the second is the physical domain (ground, airborne, marine, space). In terms of ground, Passenger AVs are perhaps the most well-known face of autonomy, with robo-taxi services and self-driving consumer vehicles gradually entering urban environments. Companies like Waymo, Cruise, and Tesla have taken different approaches to ODDs. Waymo’s fully driverless cars operate in sunny, geo-fenced suburbs of Phoenix with detailed mapping and remote supervision. Cruise began service in San Francisco, originally operating only at night to reduce complexity. Tesla’s Full Self Driving (FSD) Beta aims for broader generalization, but it still relies heavily on driver supervision and is limited by weather and visibility challenges.

Transit shuttles, though less publicized, have quietly become a practical application of AVs in controlled environments. These low-speed vehicles typically operate in geo-fenced areas such as university campuses, airports, or business parks. Companies like Navya, Beep, and EasyMile deploy shuttles that follow fixed routes and schedules, interacting minimally with complex traffic scenarios. Their ODDs are tightly defined: they may not operate in rain or snow, often run only during daylight, and avoid high-speed or mixed-traffic conditions. In many cases, a remote operator monitors operations or is available to intervene if needed. Delivery robots represent a third class of autonomous mobility—compact, lightweight vehicles designed for last-mile delivery. Their ODDs are perhaps the narrowest, but that’s by design. These robots, from companies like Starship, Kiwibot, and Nuro, navigate sidewalks, crosswalks, and short street segments in suburban or campus environments. They operate at pedestrian speeds (typically under 10 mph), carry small payloads, and avoid extreme weather, high traffic, or unstructured terrain. Because they don’t carry passengers, safety thresholds and regulatory oversight can differ significantly.

Weather is a particularly limiting factor across all autonomous systems. Rain, snow, fog, and glare interfere with LIDAR, radar, and camera performance—especially for smaller robots that operate close to the ground. Most AV deployments today restrict operations to fair-weather conditions. This is especially true for delivery robots and transit shuttles, which often halt operations during storms. While advanced sensor fusion and predictive modeling promise improvements, true all-weather autonomy remains a significant technical challenge. The intersection of weather and autonomy is an active research area [1]

Another ODD dimension is time of day. Nighttime operation brings unique difficulties for AVs: reduced visibility, increased pedestrian unpredictability, and in urban areas, more erratic driver behavior. Some systems (like Waymo in Chandler, AZ) now operate 24/7, but most deployments—particularly delivery robots and shuttles—remain restricted to daylight hours. Tesla's FSD does operate at night, but it still requires human oversight. Infrastructure also shapes ODDs in crucial ways. Many AV systems depend on high-definition maps, lane-level GPS, and even smart traffic signals to guide their decisions. In geo-fenced environments—where the route and surroundings are highly predictable—this infrastructure dependency is manageable. But for broader ODDs, where environments may change frequently or lack digital maps, achieving safe autonomy becomes much harder. That’s why passenger AVs today generally avoid rural areas, unpaved roads, or newly constructed zones.

Regulatory environments further shape ODDs. In the U.S., states like California, Arizona, and Florida have developed AV testing frameworks, but each differs in what it permits. For instance, California limits fully driverless vehicles to certain urban zones with strict reporting requirements. Delivery robots are often regulated at the city level—some cities allow sidewalk bots, others ban them outright. Transit shuttles often receive special permits for low-speed operation on limited routes. These regulatory boundaries translate directly into ODD constraints.

In terms of physical domains, Ground-based autonomous systems, especially in automotive contexts, are the most commercially visible. Self-driving vehicles operate in human-dense environments, requiring perception systems to identify pedestrians, cyclists, vehicles, and traffic infrastructure. Validation here relies heavily on scenario-based testing, simulation, and controlled pilot deployments. Standards like ISO 26262 (functional safety), ISO/PAS 21448 (SOTIF), and UL 4600 (autonomy system safety) guide safety assurance. Regulatory frameworks are evolving state-by-state or country-by-country, with Operational Design Domain (ODD) restrictions acting as practical constraints on deployment.

Autonomous aircraft (e.g., drones, urban air mobility platforms, and optionally piloted systems) must operate in highly structured, safety-critical environments. Validation involves rigorous formal methods, fault tolerance analysis, and conformance with aviation safety standards such as DO-178C (software), DO-254 (hardware), and emerging guidance like ASTM F38 and EASA's SC-VTOL. Airspace governance is centralized and mature, often requiring type certification and airworthiness approvals. Unlike automotive systems, airborne autonomy must prove reliability in loss-of-link scenarios and demonstrate fail-operational capabilities across flight phases.

Autonomous surface and underwater marine systems face unstructured and communication-constrained environments. They must operate reliably in GPS-denied or RF-blocked conditions while detecting obstacles like buoys, vessels, or underwater terrain. Validation is more empirical, often involving extended sea trials, redundancy in navigation systems, and adaptive mission planning. IMO (International Maritime Organization) and classification societies like DNV are working on Maritime Autonomous Surface Ship (MASS) regulatory frameworks, though global standards are still nascent. The dual-use nature of marine autonomy (civil and defense) adds governance complexity. Space-based autonomous systems (e.g., planetary rovers, autonomous docking spacecraft, and space tugs) operate under extreme constraints: communication delays, radiation exposure, and no real-time human oversight. Validation occurs through rigorous testing on Earth-based analog environments, formal verification of critical software, and fail-safe design principles. Governance falls under national space agencies (e.g., NASA, ESA) and international frameworks like the Outer Space Treaty. Assurance relies on mission-specific autonomy envelopes and pre-defined decision trees rather than reactive autonomy.

Governance also differs. Aviation and space operate within centralized, internationally coordinated regulatory systems (ICAO, FAA, EASA, NASA), while ground autonomy remains highly fragmented across jurisdictions. Maritime governance is progressing but lacks harmonization. Space governance, although anchored in treaties, increasingly contends with commercial activity and national interests, demanding updated risk management protocols.

Emerging efforts like the SAE G-34/SC-21 standard for AI in aviation, NASA's exploration of adaptive autonomy, and ISO’s work on AI functional safety indicate a trend toward domain-agnostic principles for validating intelligent behavior. There is growing recognition that autonomous systems, regardless of environment, need rigorous testing of edge cases, clarity of system intent, and real-time assurance mechanisms.

Ref:

[1] Vargas, J.; Alsweiss, S.; Toker, O.; Razdan, R.; Santos, J. An Overview of Autonomous Vehicles Sensors and Their Vulnerability to Weather Conditions. Sensors 2021, 21, 5397. https://doi.org/10.3390/s21165397

Summary

This chapter has provided an overviews of autonomous systems (ground, airborne, marine, space), the initial framing of expectation functions for autonomy, the governance structures into which autonomy must operate, an overview of the validation and verification mechanisms used to support these governance structures, and finally an overview of autonomy in each of the physical domains.

In the subsequent chapters, we will delve deeper into these topics with a framing informed by autonomy abstractions as shown in the figure below. At the “bottom” of these abstractions are the physical objects such as the mechanical devices and the associated electronics hardware. Layered above the electronics hardware layer are various software layers which start with middleware/infrastructure, algorithmic layers, and finally the connection to humans.

These topics will be addressed at the conceptual level and also examined in specific fashion for the four physical domains (example figure below).

Productization Lessons and Assessments:

Key lessons for productization include:

  1. Engineers must understand their products operate inside a governance structure consisting of laws, regulations, and standards.
  2. In the case of autonomy, there are many historical standards, but standard development is also under development.
  3. A very key aspect of product design is the expectation function for the product. This expectation function is key to communication from a marketing perspective and also from a legal liability perspective.
Domain Primary Standards Body Key Autonomy Standard
Ground SAE SAE J3016
Ground ISO ISO 26262, ISO 21448
Ground UNECE UN R157
Airborne RTCA DO-178C, DO-365
Airborne FAA/EASA UAV autonomy certification
Marine IMO MASS autonomy levels
Marine DNV Autonomous ship standards
Space NASA ALFUS autonomy framework
Space CCSDS Spacecraft autonomy protocols
Cross-domain IEEE IEEE 7000 series
Cross-domain IEC IEC 61508
Cross-domain NIST AI Risk Management Framework

Industries and Companies:

Type Description Example Players (Companies / Organizations)
Regulators & Government Agencies Define laws, certification pathways, and operational constraints for autonomous systems across domains (ground, air, marine, space). They translate legislation into enforceable rules and approvals. NHTSA, FAA, EASA, International Maritime Organization, NASA, ESA
Standards Organizations / Industry Consortia Develop technical standards, safety frameworks, and autonomy classification systems that regulators and industry rely on (e.g., SAE levels, ISO safety standards). SAE International, ISO, IEEE, RTCA, ASTM
Legal & Advisory Firms Interpret liability, compliance, and regulatory frameworks; support litigation, risk assessment, and policy strategy for autonomy deployments. Baker McKenzie, DLA Piper, Latham & Watkins
Certification & Testing Authorities Provide independent validation, certification audits, and compliance verification against safety standards (ASIL, DAL, etc.). Critical for market entry. TÜV SÜD, UL Solutions, DNV
Simulation & Digital Twin Software Providers Provide tools for scenario-based validation, digital twins, and V&V workflows across autonomy stacks (SIL/HIL, scenario generation, formal testing). NVIDIA (DRIVE Sim), MathWorks, Ansys, Siemens
Test Track & Physical Testing Infrastructure Providers Operate controlled environments for real-world validation (proving grounds, UAV corridors, maritime test ranges). Bridge sim-to-real validation. American Center for Mobility, MCity, FAA UAV Test Sites

Hardware and Sensing Technologies

The underlying active physical components for all electronic systems are semiconductors. Semiconductors span several major categories based on function, material system, and integration level. At the most basic level are discrete devices such as diodes, MOSFETs, IGBTs, and rectifiers, which control current and voltage and are widely used in power conversion and motor drives. Analog and mixed-signal semiconductors handle sensing, amplification, signal conditioning, and power management (e.g., ADCs, DACs, voltage regulators, sensor interfaces). Memory semiconductors—such as DRAM, SRAM, NAND flash, and emerging non-volatile memories like MRAM—store data and program code. Power semiconductors use materials such as silicon, silicon carbide (SiC), and gallium nitride (GaN) to efficiently switch high voltages and currents in electric vehicles, aircraft power systems, and renewable energy converters. Finally, specialized devices such as RF front-end chips, image sensors (CMOS), FPGAs, and AI accelerators support communication, perception, and high-performance computing tasks. Together, these categories form the layered semiconductor ecosystem that underpins modern automotive, airborne, marine, and space electronic architectures. An important category is digital logic devices include microcontrollers (MCUs), microprocessors (MPUs), and system-on-chip (SoC) devices that execute programming of some form (FPGA, Software, AI). We shall discuss this in greater detail in the next chapter on software.

In this chapter, we shall review historical background to the absorption of semiconductors in various mobility domains. As a part of this background, we shall outline some key “productization” challenges such as safety, governance, and supply chain management. With this background, we will introduce the jump in complexity introduced by autonomy and revisit the key “productization” challenges.

Historical Background

Historically, cyber-physical systems were mechanically based, but with the advent of modern electronics, critical functions moved rapidly to electronics subsystems. For example, automotive electronics in the 1970s and early 1980s, tightening emissions standards in the U.S., Europe, and Japan pushed automakers to adopt microprocessor-based engine control units (ECUs). What began as simple ignition timing modules evolved into closed-loop engine management systems handling fuel injection and knock control— “Power Train” block shown in the graphic. These early semiconductor deployments were ruggedized analog/mixed-signal designs, optimized for reliability in high-temperature environments rather than computational complexity.

Through the late 1980s and 1990s, electronics expanded from powertrain into chassis and safety systems. Anti-lock braking systems (ABS), electronic stability control, traction control, and eventually electric power steering (EPS) required real-time sensing and actuation. This corresponds to the “Chassis” and “Safety and Control” domains in the image (ABS, airbag controllers, TPMS, collision warning). Here, semiconductors enabled distributed sensing (wheel speed sensors, accelerometers, pressure sensors) and deterministic embedded processing. The architecture remained domain-centric: each function had its own ECU, with limited cross-domain integration. The next wave, roughly 1995–2010, was driven less by regulation and more by consumer expectation. Vehicles became platforms for infotainment and comfort electronics, shown in the graphic’s “Infotainment” and “Comfort and Control” sections (dashboard displays, navigation, climate control, seat modules, body electronics). This phase marked the introduction of higher-performance digital SoCs, memory subsystems, and human-machine interface processors. Importantly, this is when in-vehicle networking standards such as CAN, LIN, and later FlexRay (listed under “Networking” ) became essential. The car shifted from isolated ECUs to a distributed electronic architecture connected by data buses—semiconductors were no longer just controllers; they were nodes in a communication network.

Figure 1: Automobile electronics

By the 2010s, semiconductor content per vehicle had grown exponentially, especially with hybrid and electric vehicles Power electronics (IGBTs, MOSFETs, later SiC devices), battery management systems, and high-voltage control loops dramatically increased the role of advanced semiconductor materials and mixed-signal integration. Simultaneously, advanced driver assistance systems (ADAS)—collision warning, parking assist, night vision—required vision processors, radar front-ends, and sensor fusion chips, extending the “Safety and Control” block into high-performance computing territory.

Airborne Sector

If the automotive graphic represents the distributed, domain-based maturation of electronics in cars, the airborne sector followed a similar—but more safety-critical and certification-driven—trajectory. In the early jet age (1950s–1970s), aircraft electronics—then called avionics—were largely analog and federated. Radar, navigation, flight instruments, engine monitoring, and autopilot systems were separate boxes with limited interconnection. Semiconductors initially replaced vacuum tubes for reliability and weight reduction, but computational capability was modest. Much like early automotive engine controllers, electronics were introduced to solve specific operational needs—navigation accuracy, radio communication, and flight stabilization—rather than to create an integrated digital platform. The major inflection point came in the 1980s and 1990s with the rise of digital flight control and “fly-by-wire” architectures, pioneered in civil aviation by aircraft such as the Airbus A320 and expanded in military platforms like the F-16 Fighting Falcon. Here, semiconductors moved from advisory roles to safety-critical control loops. Digital signal processors and radiation-tolerant microcontrollers executed deterministic real-time algorithms for stability augmentation, envelope protection, and engine control (FADEC).

During the 1990s–2000s, avionics entered a “glass cockpit” era. Aircraft such as the Boeing 777 replaced analog gauges with integrated digital displays driven by high-reliability processors and graphics subsystems. Data buses such as ARINC 429 and later AFDX (ARINC 664) enabled deterministic networking between flight computers, sensors, and displays—analogous to CAN and FlexRay in the automotive diagram. However, unlike automotive networks, airborne data buses were built around strict partitioning, redundancy, and fault containment regions. Triple-modular redundancy and dissimilar processors became common for flight-critical functions. In propulsion and power systems, semiconductors expanded from monitoring to active control. Full Authority Digital Engine Control (FADEC) units used mixed-signal ASICs and microprocessors to optimize fuel flow, reduce emissions, and improve reliability. With the emergence of “more-electric aircraft” concepts—exemplified by the Boeing 787—power electronics content increased substantially. High-voltage converters, motor drives, and solid-state power controllers replaced hydraulic subsystems, mirroring (though earlier in safety rigor) the electrification wave seen in automotive HEV/EV platforms.

Marine Sector

The marine industry’s use of electronics evolved from isolated navigation aids to highly integrated digital ship systems, following a trajectory structurally similar to automotive but at much larger power scales and with longer asset lifecycles. In the 1950s through the 1970s, marine electronics were primarily analog and functionally segregated: radar, sonar, gyrocompasses, VHF radios, and basic autopilots operated as standalone systems. Early semiconductor adoption focused on improving reliability and reducing size, particularly in radar and communication equipment. These systems were advisory in nature; propulsion and steering remained largely mechanical or hydraulic. The first major digital transition occurred in the 1980s and 1990s with the arrival of microprocessor-based engine control, satellite navigation (GPS), and electronic charting systems. Ships began incorporating digital propulsion governors, fuel optimization systems, and centralized alarm monitoring. This period resembles the automotive shift from carburetors to engine control units and ABS systems. Importantly, networking standards such as NMEA 0183 and later NMEA 2000 allowed sensors and navigation systems to exchange data, marking the move from isolated instrumentation to distributed marine electronics architectures.

By the 2000s, large commercial and naval vessels adopted Integrated Bridge Systems (IBS) and Integrated Platform Management Systems (IPMS), consolidating radar, charting, sonar, propulsion status, and safety alerts into unified digital consoles. Power electronics content increased significantly with electric propulsion drives, thruster control, hybrid marine power systems, and dynamic positioning systems. This phase mirrors the automotive expansion into electrification and body-domain integration. In recent years, semiconductor density has grown further with sensor fusion for collision avoidance, remote fleet monitoring, predictive maintenance, and early-stage autonomous surface vessels. While regulatory frameworks remain conservative, marine architecture now consists of interconnected propulsion, navigation, safety, power distribution, and autonomy subsystems — conceptually analogous to the domain blocks in the automotive graphic.

Space Sector

The space sector followed a parallel but more reliability-driven evolution, shaped by radiation tolerance, extreme environments, and mission assurance requirements. In the early space age, spacecraft electronics were built from discrete logic and radiation-hardened components with very limited computational capacity. Systems were strictly federated: guidance, telemetry, power conditioning, communications, and thermal control were separate subsystems with built-in redundancy. Early digital computers such as those used in the Apollo Guidance Computer demonstrated that semiconductors could enable autonomous navigation, but computational margins were minimal and fault tolerance was paramount. During the 1990s and early 2000s, radiation-hardened microprocessors and standardized spacecraft data buses such as MIL-STD-1553 and SpaceWire enabled more modular digital architecture. Satellites adopted structured subsystems for attitude determination and control, onboard data handling, payload processing, and power regulation. Missions like the Hubble Space Telescope and deep-space platforms such as the Mars Reconnaissance Orbiter incorporated increasingly sophisticated onboard processing for navigation, instrument control, and fault management. This stage resembles the distributed ECU era in automotive, where each domain was digitally controlled but interconnected via deterministic buses. In the modern era, semiconductor capability in space systems has expanded dramatically. High-throughput communications satellites, FPGA-based reconfigurable payloads, advanced solid-state power controllers, electric propulsion systems, and autonomous fault detection algorithms define current architectures. Commercial constellations developed by companies such as SpaceX have introduced vertically integrated avionics stacks and more software-defined spacecraft platforms. Unlike automotive, however, semiconductor design in space prioritizes radiation hardening, redundancy, and long-duration reliability over cost optimization. The overall trajectory mirrors the automotive diagram’s layered growth: from instrumentation digitization to closed-loop control, to networked subsystems, and now toward increasingly autonomous, software-defined space platforms.

Across marine and space domains — as in automotive — semiconductor adoption progressed from monitoring to control, from isolated subsystems to networked architecture, and from mechanical dominance to electrically and computationally mediated platforms. The architectural blocks differ in naming (propulsion, navigation, attitude control, power conditioning), but structurally they represent the same historical layering visible in the automotive figure.

Governance and Safety

As semiconductor content in vehicles increased, automotive safety protocols evolved from informal engineering practices to highly structured, lifecycle-based governance frameworks that now extend down to silicon IP and AI behavior. In the 1980s and 1990s, when electronic systems such as ABS and airbag controllers first became widespread, safety assurance was largely handled through company-specific processes. OEMs and Tier-1 suppliers relied on internal FMEA methods, redundancy design practices, and in some cases adaptations of aerospace guidance like DO-178 concepts. There was no unified automotive electronic safety standard, even as vehicles transitioned from isolated ECUs to increasingly networked systems.

The first major formal framework influencing automotive electronics was IEC 61508, published in 1998. IEC 61508 introduced Safety Integrity Levels (SILs), lifecycle safety management, probabilistic hardware fault metrics, and the concept of a structured safety case. However, it was designed as a generic standard for industrial programmable electronic systems. As vehicle architectures became more distributed and semiconductor complexity grew—moving from simple microcontrollers to multi-domain ECUs connected via CAN—automotive stakeholders recognized the need for a sector-specific adaptation.

That led to the publication of ISO 26262 in 2011. ISO 26262 was a transformative step, introducing Automotive Safety Integrity Levels (ASIL A–D), formal Hazard Analysis and Risk Assessment (HARA), hardware architectural metrics such as Single Point Fault Metric (SPFM) and Latent Fault Metric (LFM), and strict requirements traceability across the development lifecycle. Importantly, ISO 26262 directly influenced semiconductor design. Silicon vendors began offering ASIL-ready microcontrollers with lockstep CPU cores, ECC-protected memory, watchdog timers, and documented FMEDA data to support system integrators. Safety moved from being a vehicle-level validation exercise to being embedded in chip architecture and development processes.

The historical progression of safety protocols in airborne systems reflects the increasing reliance on semiconductors in avionics, flight control, and mission-critical software. Unlike automotive, aviation adopted structured safety governance very early, because electronics entered directly into safety-critical control loops such as autopilot and fly-by-wire. Also, increasing integration of custom ASICs and programmable logic devices in avionics led to the publication of DO-254 in 2000. DO-254 formalized design assurance for airborne electronic hardware, including FPGAs and complex microcircuits. It required documented development lifecycles, verification rigor proportional to hardware design assurance levels, and traceability from requirements to implementation.

For marine systems, as digital navigation and propulsion control systems expanded in the 1980s and 1990s, regulatory attention shifted toward reliability and redundancy of electronic systems. Classification societies such as DNV, Lloyd's Register, and American Bureau of Shipping developed rules for electrical and control systems onboard ships. These rules require redundancy in steering and propulsion control, fault tolerance in dynamic positioning systems, and environmental qualification of electronics for vibration, humidity, and salt exposure. The introduction of the Global Maritime Distress and Safety System (GMDSS) in the 1990s marked a major digital milestone. Satellite communications, automated distress signaling, and integrated bridge systems increased semiconductor density. As ships adopted Integrated Bridge Systems (IBS) and Integrated Platform Management Systems (IPMS), classification societies began issuing more formal guidance on software quality, failure mode analysis, and cyber resilience. Still, marine governance remained largely prescriptive and performance-based, rather than process-assurance-based.

Finally, space safety and electronics assurance evolved under extreme reliability constraints from the beginning, due to the impossibility of repair and the high cost of mission failure. Early space programs operated under agency-specific reliability and redundancy doctrines rather than formalized software standards. NASA and defense space agencies emphasized radiation hardening, hardware redundancy, and conservative design margins. Spacecraft have used fault detection, isolation, and recovery (FDIR) techniques from the outset.

Overall, safety standards have tracked the increased consumption of electronic systems.

Conventional Validation and Verification

As discussed in chapter 2, all of these systems live under a governance structure where validation and verification technology links the technical world to the governance structure. Critical in enabling these processes is the domain of Electronic Design Automation (EDA). EDA refers to the software tools and workflows used to design, verify, and prepare semiconductor devices and electronic systems for manufacturing. At the chip level, the flow typically begins with system architecture and specification, followed by separate but converging analog and digital design streams. In digital design, engineers describe functionality using hardware description languages (HDLs) such as Verilog or VHDL, simulate for functional correctness, synthesize to logic gates, and perform place-and-route to create a physical layout. This is followed by static timing analysis, power analysis, signal integrity checks, and increasingly, formal verification and functional safety validation (e.g., ISO 26262 contexts). In analog/mixed-signal design, the flow is more device- and layout-centric: schematic capture, SPICE-level simulation (corner, Monte Carlo, noise, mismatch), layout with careful parasitic extraction, and iterative verification (LVS/DRC). At advanced nodes, the boundary between analog and digital blurs in mixed-signal SoCs, requiring tight co-simulation and cross-domain verification.

Once the silicon design is complete, the flow extends to package design, which has become increasingly critical in advanced-node and heterogeneous integration contexts (e.g., chiplets, 2.5D/3D integration). Package EDA tools model signal integrity, power integrity, thermal behavior, and mechanical stress across substrates, interposers, and bumps. The package is no longer a passive carrier; it is an electrical extension of the die, affecting timing closure, power delivery, and high-speed interfaces (e.g., UCIe, HBM). Finally, at the PCB level, board design tools integrate schematic capture, component placement, routing, and multi-physics analysis (signal integrity, EMI/EMC, thermal). High-speed digital systems require co-design between chip I/O, package escape routing, and PCB stackup to maintain impedance control and timing margins. Modern EDA workflows increasingly emphasize cross-domain co-design—from transistor to board—because performance, reliability, and safety are emergent properties of the entire electronic system, not just the silicon alone.

The Electronic Design Automation (EDA) industry is highly concentrated, with dominant global vendors controlling the majority of advanced semiconductor design workflows. Synopsys, Cadence Design Systems, and Siemens EDA (formerly Mentor Graphics) collectively provide end-to-end toolchains spanning digital implementation, analog/mixed-signal design, verification, IP integration, packaging, PCB design, and multi-physics analysis. Synopsys is particularly strong in digital synthesis, verification, and IP; Cadence has deep capabilities in custom/analog design and system analysis; and Siemens EDA is well known for PCB design, verification, and manufacturing integration. Beyond the “big three,” companies such as Ansys play a critical role in sign-off physics (signal integrity, power integrity, thermal, electromagnetics), while emerging players focus on AI-assisted design automation and specialized domains like photonics or chiplet integration. The high technical complexity, deep foundry integration (e.g., with TSMC, Samsung, Intel), and massive R&D investment required at advanced nodes create significant barriers to entry, reinforcing the industry’s oligopolistic structure.

Physical testing of electronics spans wafer probe, packaged device qualification, board-level validation, and full system stress testing, and is supported by a concentrated set of global vendors. In semiconductor production test, automated test equipment (ATE) leaders such as Teradyne and Advantest dominate high-volume logic, memory, and SoC testing, enabling parametric characterization, functional verification, and speed binning at wafer and final test. For reliability and environmental stress—HTOL, temperature cycling, vibration, and humidity—chamber providers like ESPEC and Thermotron are widely used in automotive and aerospace qualification flows. Electrical measurement and compliance validation at the device and board level rely heavily on instrumentation from Keysight Technologies and Rohde & Schwarz, particularly for high-speed interfaces and RF systems. Inspection and failure analysis—critical for advanced packaging and heterogeneous integration—often leverage X-ray and acoustic microscopy systems from Nordson, as well as materials analysis platforms from Thermo Fisher Scientific. Together, these vendors underpin the physical validation layer that complements design verification, ensuring performance, reliability, and safety before deployment into mission-critical applications.

Governance: EMI sharing and health

Figure 1

Another key aspect of governance is the management of shared resources. In the case of the mechanical world, this means laws and regulations in transportation in connection with traffic laws and the traffic infrastructure. In electronics, it means the management of the shared frequency spectrum and health safety issues. For shared use, in the US, the primary legal basis was the communication act passed in 1934 which created the regulator (Federal Communications Commission [FCC]). The FCC manages the radio spectrum (figure 1) through a range of regulatory and technical actions to ensure its efficient and interference-free use. It allocates specific frequency bands for various services—such as broadcasting, cellular, satellite, public safety, and amateur radio—based on national needs and international agreements. The FCC issues licenses to commercial and non-commercial users, setting terms for power limits, coverage areas, and operating conditions. It also conducts spectrum auctions to assign frequencies for commercial use, such as 5G, while reserving portions for public services and unlicensed uses like Wi-Fi.

In addition, the FCC enforces rules to prevent harmful interference, coordinates spectrum sharing and repurposing efforts, and leads initiatives like dynamic spectrum access and band reallocation to adapt to evolving technological demands. To enforce these standards, the FCC requires many devices to undergo testing and certification before they can be marketed or sold in the United States. This process is carried out by FCC-recognized testing laboratories, known as accredited Conformity Assessment Bodies (CABs), which evaluate products against applicable Part 15 or Part 18 regulations, among others. Certified devices must meet limits on emissions, immunity, and specific absorption rate (SAR) when applicable. Once a product passes testing, the lab submits a report to a Telecommunications Certification Body (TCB), which issues the FCC ID and authorizes the product for sale. These labs play a critical role in ensuring compliance, supporting innovation while maintaining spectrum integrity and public safety.

FCC Part 15 and Part 18 differ primarily in the type and purpose of radio frequency (RF) emissions they regulate. Part 15 governs devices that intentionally or unintentionally emit RF energy for communication purposes, such as Wi-Fi routers, Bluetooth devices, and computers. These devices must not cause harmful interference and must accept interference from licensed users. In contrast, Part 18 regulates Industrial, Scientific, and Medical (ISM) equipment that emits RF energy not for communication, but for performing physical functions like heating, welding, or medical treatments—examples include microwave ovens and RF diathermy machines. While both parts limit electromagnetic interference, Part 15 devices operate under stricter emissions limits due to their proximity to communication bands, whereas Part 18 devices are allowed higher emissions in designated ISM frequency bands. Additionally, health and safety regulations for Part 18 equipment are typically overseen by other agencies such as the FDA or OSHA, while the FCC focuses on interference mitigation.

Figure 2

A key instrument for electromagnetic testing is an anechoic chamber (figure 2). An anechoic chamber is a specialized, sound- and radio wave-absorbing enclosure designed to create an environment free from reflections and external interference. Its walls, ceiling, and floor are typically lined with wedge-shaped foam or ferrite tiles that absorb electromagnetic or acoustic waves, depending on the application. For radio frequency (RF) testing, the chamber is constructed with conductive materials (like steel or copper) to form a Faraday cage, isolating it from external RF signals. In acoustic chambers, sound-absorbing foam eliminates echoes and simulates free-field conditions. Anechoic chambers are critical in industries such as telecommunications, defense, aerospace, and consumer electronics, where they are used to test antenna performance, electromagnetic compatibility (EMC), emissions compliance, radar systems, or audio equipment in highly controlled, repeatable conditions. The chamber ensures that test measurements reflect only the characteristics of the device under test (DUT), without environmental interference.

All hardware in all the domains of interest (ground, airborne, marine, space) must comply with the FCC standards and in cases involving human contact, FDA standards for health and safety !

Finally, testing labs and services organizations play a critical role in certifying electronics against national and international standards, particularly for safety, electromagnetic compatibility (EMC), environmental robustness, and reliability. Global conformity assessment firms such as UL Solutions, TÜV SÜD, Intertek, and Bureau Veritas provide third-party testing and certification to standards such as IEC 61000 (EMC), IEC 62368 (product safety), ISO 26262 (automotive functional safety), DO-160 (aerospace environmental conditions), and MIL-STD-810 (defense environmental testing). These organizations operate accredited laboratories (often ISO/IEC 17025 certified) that conduct emissions and immunity testing, thermal cycling, vibration, ingress protection (IP), and safety evaluations required for CE marking, FCC authorization, automotive AEC qualification, and other regulatory approvals. In highly regulated sectors—automotive, aerospace, medical, and industrial—independent lab validation provides not only compliance evidence but also liability mitigation and market access assurance, making standards-driven testing an essential bridge between engineering validation and commercial deployment.

Electronics Supply Chain

In product development, the initial focus is on functionality and differentiated value. As discussed in the governance sections, the next stage is to make sure the product conforms within the appropriate regulatory frameworks connected to safety and shared usage. The final stage and perhaps the most important stage is that of consistently delivering and supporting the product in the marketplace. To consistently deliver the product, one must manage the supply chain which drives the forward delivery of the product. In addition, as customers interact with the product, there is a reverse flow which involves reparability, diagnostics, and in most situations safe disposal.

For most products, the mechanical component supply chain, maintenance, and calibration have a well-formed rich history. As discussed, recent history has seen a large infusion of semiconductors. Supply Chain Management (SCM) refers to the strategic coordination of procurement, production, logistics, and distribution processes to ensure timely and cost-effective delivery of materials and systems [61]. The SCOR model, developed by the Supply Chain Council (SCC), is a widely used framework for designing and evaluating supply chains [62].

Each phase integrates digital tools and real-time analytics to ensure supply resilience and performance traceability.

Lean Supply Chain Management

Lean SCM focuses on minimizing waste (time, material, cost) across the chain while maximizing value for the customer [63]. In autonomous system production, Lean methods include:

Lean thinking improves agility in responding to rapid technological changes and component obsolescence.

Agile and Digital Supply Chains

Recent developments have introduced Agile Supply Chain concepts, emphasizing adaptability, visibility, and rapid reconfiguration [64]. Digital Supply Chain (DSC) technologies such as:

Risk Management and Resilience Building

Supply chain risk management (SCRM) in autonomous systems involves proactive identification and mitigation of disruptions:

AI-based SCRM tools (e.g., Resilinc, Everstream) now monitor supplier health and logistics delays in real time.

Challenges in Supply Chain Management

Challenge Description Impact
Component Scarcity Limited supplies for high-performance chips or sensors. Production delays, increased cost.
Globalization Risks Dependence on international logistics and trade. Exposure to geopolitical instability.
Quality Variability Inconsistent supplier quality control. Rework and testing overhead.
Cybersecurity Threats Counterfeit or tampered components. System failure or security breaches.
Data Supply Issues Dependence on labelled datasets or simulation platforms. Delayed AI development or bias introduction.

Environmental and Ethical Constraints Supply chains for autonomy-related technologies often rely on materials such as lithium, cobalt, and rare earth metals used in sensors and batteries. Ethical sourcing, sustainability, and carbon accountability are now critical supply chain dimensions [53].

Example: Regulations aimed at preventing the sourcing of minerals from conflict-affected regions—particularly in parts of Central Africa—focus on “conflict minerals” such as tin, tungsten, tantalum, and gold (3TG). In the United States, Section 1502 of the Dodd-Frank Wall Street Reform and Consumer Protection Act requires publicly traded companies to conduct due diligence and disclose whether these minerals originated from the Democratic Republic of the Congo or adjoining countries, while the European Union enforces similar supply-chain due diligence under the EU Conflict Minerals Regulation. These frameworks compel companies to trace supply chains, implement risk mitigation processes aligned with OECD guidance, and publicly report sourcing practices to reduce the financing of armed groups.

The Rise of Supply Chain Cybersecurity As hardware and software become interconnected, supply chain cybersecurity has emerged as a critical risk domain. Compromised firmware or cloned microcontrollers can introduce vulnerabilities deep within a system’s hardware root of trust [54]. Security frameworks such as NIST SP 800-161, ISO/IEC 27036, and Cybersecurity Maturity Model Certification (CMMC) are being applied to mitigate these threats.

Evolution of Supply Chains

Ground Systems:

In terms of ground systems, the automotive industry has evolved over time to a very optimized supplier structure with Original Equipment Manufacturers (OEMs), tiered series of suppliers (Table 1).

Level Supplier
OEM BMW, Ford, GM, Mercedes-Benz, Toyota, etc.
Infrastructure Government (federal, state, local), cellular (safety), map applications, etc.
Tier 1 (Systems) Continental, Delphi, Bosch, Denso, etc.
Tier 2 (Parts) Texas Instruments, NXP, TDK, Yazaki, Bridgestone, etc.
Tier 3 (Materials) 3M, DuPont, BASF, Shin-Etsu, etc.

Table 1. Short lifecycle versus LLC products.

Further, much like the US Department of Defense, automotive companies traditionally require chips with automotive grade certification. Automotive-grade components require stringent compliances. (Passive components need AEC Q200, ASILI/ISO 26262 Class B, IATF 16949 qualification while active components, including automotive chips, should be compliant with AEC Q100, ASILI/ISO 26262 Class B, IATF 16949 standards).

Airborne (Aerospace)

In aerospace, the supply chain evolved around regulatory certification authority and system safety long before cost optimization became dominant. As aircraft systems transitioned from analog to fly-by-wire and software-intensive architectures, standards such as DO-178 (software), DO-254 (hardware), and ARP4754 (system development) forced a structural shift: Tier-1 suppliers became deeply embedded in certification artifacts, not just hardware delivery. Companies such as Honeywell and Raytheon Technologies (Collins Aerospace) do not merely supply components; they co-own verification evidence, safety analyses, and traceability matrices required by the FAA/EASA. This creates a tightly coupled, long-cycle ecosystem where primes like Boeing act as system-of-systems integrators, and switching suppliers is extremely costly due to certification recertification burdens. The airborne model therefore evolved into a high-barrier, risk-sharing, assurance-centric hierarchy.

Marine

Marine supply chains historically centered on shipyards and mechanical systems, with less formalized tier structures than aerospace. Oversight came from classification societies (e.g., DNV, ABS) rather than centralized regulators, and vessels were often semi-custom builds. However, as digital navigation, dynamic positioning, and now autonomy have increased system complexity, Tier-1 marine technology firms such as Kongsberg Gruppen and Wärtsilä have moved closer to aerospace-style system integration roles. Unlike automotive’s scale-driven tiers, marine tiers evolved around project integration and compliance with flag-state and class requirements. The current autonomy push is accelerating a transition toward software-centric supply chains, but production volume remains low and customization remains high, keeping marine structurally more fragmented than aerospace.

Space

The space industry began as a vertically integrated, government-driven ecosystem dominated by primes such as Lockheed Martin and Boeing under cost-plus contracts with agencies like NASA and the DoD. Reliability and mission assurance, not cost efficiency, defined supplier relationships, and specialized radiation-hardened component vendors formed niche Tier-2/3 layers. In the last decade, however, companies like SpaceX have reintroduced vertical integration to compress development cycles and control risk across propulsion, avionics, and launch operations. The result is a bifurcated supply chain: one high-assurance national security chain with traditional tier structures, and one commercially agile “NewSpace” chain that blends COTS components with vertically integrated primes. Certification and mission risk, rather than volume economics, remain the dominant structural forces.

Semiconductor Economics:

The cost of building a semiconductor device is dominated by three interacting factors: design (NRE), wafer fabrication, and volume, all of which are tightly linked to lithography node. At advanced nodes (e.g., 5 nm, 3 nm), non-recurring engineering (NRE) costs can exceed hundreds of millions of dollars due to mask sets, EDA complexity, verification effort, and IP integration, while wafer costs rise sharply because of EUV lithography, tighter process control, and lower initial yields. As a result, cutting-edge nodes only make economic sense at very high production volumes, where fixed design and mask costs can be amortized over millions of units; otherwise, the cost per die becomes prohibitive. Conversely, mature nodes (e.g., 28 nm, 40 nm, 65 nm) have far lower mask and wafer costs, stable yields, and shorter development cycles, making them economically attractive for automotive, industrial, and mixed-signal applications where performance density is less critical and production volumes may be moderate rather than massive.

Production volumes differ markedly between advanced and mature semiconductor nodes because of economics and application mix. Advanced nodes (e.g., 5 nm, 3 nm) are typically justified only for extremely high-volume markets such as flagship smartphones, data-center CPUs/GPUs, and AI accelerators, where tens of millions—or even hundreds of millions—of units can amortize enormous design and mask costs. In contrast, mature nodes (e.g., 28 nm, 40 nm, 65 nm and above) support a much broader diversity of products—automotive MCUs, power management ICs, analog, RF, and industrial controllers—often produced in moderate but long-lived volumes over many years. While individual mature-node programs may ship fewer units annually than leading-edge mobile processors, the aggregate volume across applications is extremely large and more stable over time, which explains why mature-node capacity remains strategically important despite the industry’s focus on leading-edge scaling.

Today, automotive volumes are sufficient to drive unique semiconductor designs on mature nodes, but generally all the cyber-physical domains must use standard parts.

Autonomy and Hardware

Embedded protocols (inc. domain-specific), sensors, actuators, long and short distance communication and components, navigation and positioning.

From a hardware perspective, the big jump in functionality is the introduction of sensors, the computation to interpret the world, and then actuation to provide autonomy.

Ground.

The graphic illustrates the multi-layered sensor stack typically required for autonomous vehicles, combining complementary sensing modalities to achieve redundancy, range coverage, and environmental robustness. At the longest ranges, long-range radar and forward-facing cameras provide early detection of vehicles, obstacles, and road geometry. Long-range radar operates reliably in rain, fog, and low-light conditions, measuring object distance and relative velocity using Doppler shifts. Cameras, on the other hand, provide high-resolution semantic information—lane markings, traffic signs, traffic lights, and object classification (car vs. pedestrian vs. cyclist). While cameras excel at classification, they are more sensitive to lighting and weather, which is why radar redundancy is essential for safety-critical functions such as adaptive cruise control and highway autopilot.

In the mid- to short-range envelope around the vehicle, short-range radar and LiDAR (Light Detection and Ranging) enhance situational awareness. Short-range radar monitors adjacent lanes, blind spots, and cross-traffic. LiDAR provides high-precision 3D point clouds, enabling accurate mapping of object contours, free space, and road boundaries. LiDAR is particularly valuable for precise localization and obstacle detection in urban environments. Together, these sensors support functions like lane changes, merging, intersection handling, and obstacle avoidance. Very close to the vehicle, ultrasonic sensors and near-field cameras provide low-speed maneuvering awareness. Ultrasonic sensors detect curbs, parking barriers, and nearby objects within a few meters, enabling parking assist and tight maneuvering. Surround-view camera systems support 360-degree perception for low-speed autonomy and automated parking. Overlaying all these sensing layers is vehicle-to-everything (V2X) or wireless communication, which extends perception beyond line-of-sight by exchanging information with infrastructure and other vehicles. Collectively, the autonomy stack relies on sensor fusion—combining radar robustness, camera semantics, LiDAR precision, and ultrasonic proximity—to create a reliable environmental model suitable for safety-critical decision-making.

In terms of computation, autonomous ground vehicles require high-throughput, low-latency edge computation to process multi-modal sensor streams (camera, radar, LiDAR, ultrasonic) in real time. The compute stack typically integrates heterogeneous architectures—CPUs for control logic, GPUs/NPUs for deep neural network inference, and dedicated safety microcontrollers running ISO 26262–compliant software. These platforms must handle perception (object detection, segmentation), localization (SLAM, sensor fusion), prediction (trajectory forecasting), and planning (path optimization) within tens of milliseconds, all under automotive thermal and power constraints. Redundant compute paths and lockstep processors are often used to meet functional safety goals, with over-the-air update capability enabling continuous improvement.

Airborne.

Airborne autonomous systems rely on a fusion of inertial, air-data, navigation, and external perception sensors to operate in a 3D, high-speed, safety-critical environment. Core sensors include Inertial Measurement Units (IMUs) and air-data systems (pitot tubes, angle-of-attack vanes) for attitude and aerodynamic state estimation; multi-constellation GNSS for global positioning; and radar altimeters for precise height above ground during landing. For obstacle and traffic detection, aircraft increasingly use weather radar, ADS-B receivers (traffic awareness), electro-optical/infrared (EO/IR) cameras, and sometimes LiDAR for detect-and-avoid (particularly in UAVs). Unlike ground systems, airborne autonomy must handle sparse landmarks, high closing speeds, and large vertical envelopes. Sensor reliability and redundancy are critical, with cross-checking between inertial and external navigation sources to meet aviation safety requirements.

Airborne autonomous computation prioritizes determinism, certification traceability, and fault tolerance over raw AI throughput. Flight-critical systems must comply with DO-178C (software) and DO-254 (hardware), which emphasizes verified, bounded execution and rigorous testing. Compute platforms are often partitioned using time- and space-separation (e.g., ARINC 653 architectures), ensuring that autonomy functions cannot interfere with flight controls. Compared to automotive, airborne compute may use less cutting-edge silicon but emphasizes redundancy (triple modular redundancy, cross-monitoring processors) and deterministic real-time operating systems. Power and weight constraints are critical, and thermal management must accommodate altitude-related cooling limits.

Marine.

Marine autonomous vessels operate in a reflective, cluttered, and dynamically changing surface environment. Primary sensors include marine radar (long-range detection in fog and rain), GNSS for global positioning, and high-grade IMUs for heading and motion stabilization. Automatic Identification System (AIS) receivers provide cooperative vessel tracking, while optical cameras assist in visual interpretation. COLREG maritime rules must be followed for near-field awareness, vessels employ optical cameras, thermal cameras (night and low visibility), sonar (for subsurface obstacle detection), and depth sounders to prevent grounding. Compared to ground systems, marine sensing must manage wave motion, multipath reflections from water, salt corrosion, and very long detection ranges with sparse infrastructure. Subsurface autonomy (e.g., AUVs) further depends on acoustic positioning and Doppler velocity logs because GNSS is unavailable underwater.

Marine autonomy computation operates in a lower-speed but highly variable environment, often combining onboard compute with shore-based or cloud-assisted systems. Vessels may employ robust industrial-grade processors running perception and navigation stacks for radar, AIS (Automatic Identification System), sonar, and camera inputs. Because marine systems often operate for extended durations at sea, energy efficiency and environmental hardening (salt, humidity, vibration) are important. Autonomy compute must integrate route optimization, collision avoidance (COLREGs compliance), and remote monitoring, sometimes with partial human oversight. Unlike aerospace, certification is less centralized, allowing somewhat more flexibility in compute architectures.

Space.

Space autonomy operates in an extreme, infrastructure-free environment where navigation and state awareness rely heavily on inertial, optical, and celestial sensing. Satellites use star trackers for ultra-precise attitude determination, sun sensors for coarse orientation, and gyroscopes for angular rate measurement. GNSS receivers may be used in low Earth orbit, but deep-space missions rely on onboard optical navigation (planet/star tracking), LIDAR altimeters (for planetary landing), and radar for surface mapping. Proximity operations (e.g., docking, formation flying) use vision-based navigation and relative LIDAR or radar sensors. Unlike ground, airborne, or marine systems, space sensors must withstand radiation, vacuum, and extreme temperature cycles, and they often operate with minimal real-time human supervision due to communication latency. Sensor fusion in space emphasizes fault detection, graceful degradation, and long-duration reliability over raw environmental density.

Space autonomy computation is constrained by radiation tolerance, power availability, and communication latency. Traditional space systems use radiation-hardened processors with lower clock speeds but extremely high reliability and error-correction capabilities. Increasingly, commercial and “NewSpace” missions incorporate higher-performance COTS processors with shielding and fault detection to enable onboard AI for navigation, fault management, and autonomous operations (e.g., satellite constellation management or planetary landing). Because communication delays can be minutes or longer, deep-space systems must support autonomous decision-making with minimal ground intervention. Fault tolerance, graceful degradation, and long mission lifetimes (often 10–20+ years) dominate architectural design choices.

Validating Sensors

Autonomous vehicles place extraordinary demands on their sensing stack. Cameras, LiDARs, radars, and inertial/GNSS units do more than capture the environment—they define the limits of what the vehicle can possibly know. A planner cannot avoid a hazard it never perceived, and a controller cannot compensate for latency or drift it is never told about. Sensor validation therefore plays a foundational role in safety assurance: it characterizes what the sensors can and cannot see, how those signals are transformed into machine-interpretable entities, and how residual imperfections propagate into system-level risk within the intended operational design domain (ODD).

In practice, validation bridges three layers that must remain connected in the evidence trail. The first is the hardware layer, which concerns intrinsic performance such as resolution, range, sensitivity, and dynamic range; extrinsic geometry that pins each sensor into the vehicle frame; and temporal behavior including latency, jitter, timestamp accuracy, and clock drift. The second is the signal-to-perception layer, where raw measurements are filtered, synchronized, fused, and converted into maps, detections, tracks, and semantic labels. The third is the operational layer, which tests whether the sensing system—used by the autonomy stack as deployed—behaves acceptably across the ODD, including rare lighting, weather, and traffic geometries. A credible program links evidence across these layers to a structured safety case aligned with functional safety (ISO 26262), SOTIF (ISO 21448), and system-level assurance frameworks, making explicit claims about adequacy and known limitations.

The overarching aim is not merely to pass tests but to bound uncertainty and preserve traceability. For each modality, the team seeks a quantified understanding of performance envelopes: how detection probability and error distributions shift with distance, angle, reflectivity, ego speed, occlusion, precipitation, sun angle, and electromagnetic or thermal stress. These envelopes are only useful when translated into perception key performance indicators and, ultimately, into safety metrics such as minimum distance to collision, time-to-collision thresholds, mission success rates, and comfort indices. Equally important is traceability from a system-level outcome back to sensing conditions and processing choices—so a late failure can be diagnosed as calibration drift, timestamp skew, brittle ground filtering, overconfident tracking, or a planner assumption about obstacle contours. Validation artifacts—calibration reports, timing analyses, parameter-sweep results, and dataset manifests—must therefore be organized so that claims in the safety case are backed by reproducible evidence.

The Validation Bench: From Calibration to KPIs

The bench begins with geometry and time. Intrinsic calibration (for cameras: focal length, principal point, distortion; for LiDAR: channel angles and firing timing) ensures raw measurements are geometrically meaningful, while extrinsic calibration fixes rigid-body transforms among sensors and relative to the vehicle frame. Temporal validation establishes timestamp accuracy, cross-sensor alignment, and end-to-end latency budgets. Small timing mismatches that seem benign in isolation can yield multi-meter spatial discrepancies during fusion, particularly when tracking fast-moving actors or when the ego vehicle is turning. Modern stacks depend on this foundation: a LiDAR–camera fusion pipeline that projects point clouds into image coordinates requires both precise extrinsics and sub-frame-level temporal alignment to avoid ghosted edges and misaligned semantic labels. Calibration is not a one-off event; temperature cycles, vibration, and maintenance can shift extrinsics, and firmware updates can alter timing. Treat calibration and timing as monitorable health signals with periodic self-checks—board patterns for cameras, loop-closure or NDT metrics for LiDAR localization, and GNSS/IMU consistency tests—to catch drift before it erodes safety margins.

Validation must extend beyond the sensor to the pre-processing and fusion pipeline. Choices about ground removal, motion compensation, glare handling, region-of-interest cropping, or track-confirmation logic can change effective perception range and false-negative rates more than a nominal hardware swap. Controlled parameter sensitivity studies are therefore essential. Vary a single pre-processing parameter over a realistic range and measure how first-detection distance, false-alarm rate, and track stability evolve. These studies are inexpensive in simulation and surgical on a test track, and they surface brittleness early, before it appears as uncomfortable braking or missed obstacles in traffic. Notably, changes to LiDAR ground-filter thresholds can shorten the maximum distance at which a stopped vehicle is detected by tens of meters, shaving seconds off reaction time and elevating risk—an effect that should be measured and tied explicitly to safety margins.

Perception KPIs must be defined with downstream decisions in mind. Aggregate AUCs are less informative than scoped statements such as “stopped-vehicle detection range at ninety-percent recall under dry daylight urban conditions.” Localization health is better expressed as a time-series metric correlated with map density and scene content than as a single RMS figure. The aim is to generate metrics a planner designer can reason about when setting buffers and behaviors. These perception-level KPIs should be linked to system-level safety measures—minimum distance to collision, collision occurrence, braking aggressiveness, steering smoothness—so that changes in sensing or pre-processing can be convincingly shown to increase or decrease risk.

One of the interesting consequences of sensors calibration is the requirement to build calibration capability in the maintenance capabilities for the products.

Scenario-Based and Simulation-Backed Validation

Miles driven is a weak proxy for sensing assurance. What matters is which situations were exercised and how well they cover the risk landscape. Scenario-based validation replaces ad-hoc mileage with structured, parameterized scenes that target sensing stressors: low-contrast pedestrians, vehicles partially occluded at offset angles, near-horizon sun glare, complex specular backgrounds, or rain-induced attenuation. Scenario description languages allow these scenes to be specified as distributions over positions, velocities, behaviors, and environmental conditions, yielding reproducible and tunable tests rather than anecdotal encounters. Formal methods augment this process through falsification—automated searches that home in on configurations most likely to violate monitorable safety properties, such as maintaining a minimum separation or confirming lane clearance for a fixed dwell time. This formalism pays two dividends: it turns vague requirements into properties that can be checked in simulation and on track, and it exposes precise boundary conditions where sensing becomes fragile, which are exactly the limitations a safety case must cite and operations must mitigate with ODD constraints.

High-fidelity software-in-the-loop closes the gap between abstract scenarios and the deployed stack. Virtual cameras, LiDARs, and radars can drive the real perception software through middleware bridges, enabling controlled reproduction of rare cases, precise occlusions, and safe evaluation of updates. But virtual sensors are models, not mirrors; rendering pipelines may fail to capture radar multipath, rolling-shutter distortions, wet-road reflectance, or the exact beam divergence of a specific LiDAR. The simulator should therefore be treated as an instrument that requires its own validation. A practical approach is to maintain paired scenarios: for a subset of tests, collect real-world runs with raw logs and environmental measurements, then reconstruct them in simulation as faithfully as possible. Compare detection timelines, track stability, and minimum-distance outcomes, and quantify the divergence with time-series metrics such as dynamic time warping on distance profiles, discrepancies in first-detection timestamps, and divergence in track IDs. The goal is not to erase the sim-to-real gap—an unrealistic aim—but to bound it and understand where simulation is conservative versus optimistic.

Because budgets are finite, an efficient program adopts a two-layer workflow. The first layer uses faster-than-real-time, lower-fidelity components to explore large scenario spaces, prune uninformative regions, and rank conditions by estimated safety impact. The second layer replays the most informative cases in a photorealistic environment that streams virtual sensor data into the actual autonomy stack and closes the control loop back to the simulator. Both layers log identical KPIs and time-aligned traces so results are comparable and transferable to track trials. This combination of breadth and fidelity uncovers corner cases quickly, quantifies their safety implications, and yields ready-to-execute test-track procedures for final confirmation.

Robustness, Security, and Packaging Evidence into a Safety Case

Modern validation must encompass accidental faults and malicious interference. Sensors can be disrupted by spoofing, saturation, or crafted patterns; radars can suffer interference; GPS can be jammed or spoofed; IMUs drift. Treat these as structured negative test suites, not afterthoughts. Vary spoofing density, duration, and geometry; inject glare or saturation within safe experimental protocols; simulate or hardware-in-the-loop radar interference; and record how perception KPIs and system-level safety metrics respond. The objective is twofold: quantify degradation—how much earlier does detection fail, how often do tracks drop—and evaluate defenses such as cross-modality consistency checks, health-monitor voting, and fallbacks that reduce speed and increase headway when sensing confidence falls below thresholds. This work connects directly to SOTIF by exposing performance-limited hazards amplified by adversarial conditions, and to functional safety by demonstrating safe states under faults.

Validation produces data, but assurance requires an argument. Findings should be organized so that each top-level claim—such as adequacy of the sensing stack for the defined ODD—is supported by clearly scoped subclaims and evidence: calibrated geometry and timing within monitored bounds; modality-specific detection and tracking KPIs across representative environmental strata; quantified sim-to-real differences for critical scenes; scenario-coverage metrics that show where confidence is high and where operational mitigations apply; and results from robustness and security tests. Where limitations remain—as they always do—they should be stated plainly and tied to mitigations, whether that means reduced operational speed in heavy rain beyond a specified attenuation level, restricted ODD where snow eliminates lane semantics, or explicit maintenance intervals for recalibration.

A final pragmatic recommendation is to treat validation data as a first-class product. Raw logs, configuration snapshots, and processing parameters should be versioned, queryable, and replayable. Reproducibility transforms validation from a hurdle into an engineering asset: when a perception regression appears after a minor software update, the same scenarios can be replayed to pinpoint the change; when a new sensor model is proposed, detection envelopes and safety margins can be compared quickly and credibly. In this way, the validation of perception sensors becomes a disciplined, scenario-driven program that ties physical sensing performance to perception behavior and ultimately to system-level safety outcomes, while continuously informing design choices that make the next round of validation faster and more effective.

Autonomy Challenges

Governance and Safety Challenges:

EMI:

What are the implications for automakers ? In modern vehicles, electronics are no longer confined to infotainment or engine control—sensors, communication modules, and controllers are now central to vehicle safety and performance. These systems emit and receive electromagnetic energy, which can result in electromagnetic interference (EMI) if not properly managed. EMI can compromise safety-critical applications like radar- based adaptive cruise control or camera-based lane keeping. Sensor technologies introduce unique EMI challenges. Radar and lidar sensors, which are critical for driver assistance and autonomous systems, must not only avoid interference with each other but must also operate within spectrum allocations defined by the FCC and global bodies like the ITU. Similarly, cameras and ultrasonic sensors are susceptible to noise from nearby power electronics, especially in electric vehicles. EMI from poorly shielded cables or high-frequency switching components can cause data corruption, missed detections, or degraded signal integrity—raising both functional safety and regulatory concerns.

From a communications standpoint, FCC-compliant system design must also consider interoperability and coexistence. In a vehicle packed with Bluetooth, Wi-Fi, GPS, DSRC or C-V2X, and cellular modules, maintaining RF harmony requires careful frequency planning, shielding, and filtering. The FCC’s evolving rules for the 5.9 GHz band—reallocating portions from DSRC to C-V2X—illustrate how regulatory frameworks directly impact product architecture. OEMs must track these developments and validate that their communication modules not only operate within approved frequency bands but also do not emit spurious signals that could violate FCC emission ceilings. To meet FCC standards while ensuring high system reliability, automotive developers must embed EMI considerations early in the design cycle. Pre-compliance testing, EMI-aware PCB layout, and component-level certification all contribute to a smoother path to regulatory approval. Moreover, aligning FCC requirements with international automotive EMC standards—like CISPR 25 and UNECE R10—helps ensure global market readiness. As vehicles grow increasingly software-defined, connected, and autonomous, managing EMI through smart engineering and regulatory foresight will be a critical enabler of innovation, safety, and compliance.

As discussed, FCC regulations are primarily focused on electromagnetic interference. However, if RF energy has the potential to cause health issues, other regulators are involved. Health and safety regulation for FCC Part 18 devices—such as microwave ovens and medical RF equipment—is primarily handled by agencies. The Food and Drug Administration (FDA) oversees radiation-emitting electronic products to ensure they meet safety standards for human exposure, particularly for consumer appliances and medical devices. The Occupational Safety and Health Administration (OSHA) establishes workplace safety limits for RF exposure to protect employees who operate or work near such equipment. Meanwhile, the National Institute for Occupational Safety and Health (NIOSH) conducts research and provides guidance on safe RF exposure levels in occupational settings. While the FCC regulates RF emissions from Part 18 devices to prevent interference with licensed communication systems, it relies on these other agencies to ensure that the devices do not pose health risks to users or workers.

In the case of vehicle makers, part 18 health issues manifest themselves in use-models such as wireless power delivery where SAR levels may impact safety directly.

Finally, while the examples used above are from a US context, similar structures exist in all other geographies.

In the last decade, the airborne sector has layered autonomy and advanced sensing on top of this foundation. Modern UAVs and advanced air mobility platforms integrate sensor fusion processors, vision systems, and AI accelerators for detect-and-avoid and autonomous navigation. Commercial transports incorporate enhanced vision systems, predictive maintenance analytics, and increasingly software-defined capabilities. However, unlike automotive’s rapid consumer-driven scaling, airborne electronics remain constrained by certification timelines, long product lifecycles (20–30+ years), and extreme environmental requirements (temperature, vibration, radiation).

Challenges of Supply Chain Specific to Autonomous Systems

Autonomous systems add several unique layers of complexity to both hardware integration and supply chain management:

Multi-Vendor Dependency A single autonomous platform may use components from dozens of vendors — from AI accelerators to GNSS modules. Managing version control, firmware updates, and hardware compatibility across this ecosystem requires multi-tier coordination and continuous configuration tracking [55].

Safety-Critical Certification Hardware must meet safety and regulatory certifications, such as:

Each certification adds cost, time, and documentation requirements.

Real-Time and Deterministic Performance Integration must guarantee low-latency, deterministic behaviour — meaning that sensors, processors, and actuators must communicate within microsecond precision. This influences hardware selection and network design [56].

Rapid Technology Obsolescence AI and embedded computing evolve faster than mechanical systems. Components become obsolete before the platform’s lifecycle ends, forcing supply chains to manage technology refresh cycles and long-term component availability planning [57].

Possible Solutions and Best Practices

The most important challenges and possible solutions are summarized in the following table:

Challenge Solution / Mitigation Strategy
Component Shortages Multi-sourcing strategies and localized fabrication partnerships. EU’s Chip Act is a good example of securing future supplies.
Supplier QA Variance Supplier qualification programs and continuous audit loops.
Cybersecurity Risks Hardware attestation, firmware signing, and supply chain transparency tools (e.g., SBOMs).
Ethical Sourcing Traceable material chains via blockchain and sustainability certification.
Obsolescence Lifecycle management databases (e.g., Siemens Teamcenter, Windchill).
Integration Complexity Use of standardized hardware interfaces (CAN-FD, Ethernet TSN, PCIe).

Typical Supply Chain Management (SCM) Approaches Strategic Partnerships and Vertical Integration

Many companies are moving toward vertical integration, controlling multiple stages of the supply chain. For instance:

This approach increases supply security and reduces dependency on third parties, though it requires substantial capital investment.

Sustainability and Ethical SCM

Sustainability in supply chains focuses on reducing carbon footprint, ensuring ethical sourcing, and promoting recyclability [65]. Key practices:

Effective hardware integration and supply chain management are tightly interwoven. Integration depends on having high-quality, compatible components, while supply chains rely on robust feedback from integration and testing to forecast needs, reduce waste, and maintain reliability. Modern SCM frameworks, particularly Lean, Agile, and Digital models, offer strategies to make the autonomy industry more resilient, sustainable, and responsive.

Summary

This chapter explains how semiconductors and electronics became the foundation of modern autonomous systems across ground, airborne, marine, and space platforms. It shows a common historical pattern: systems began with mostly mechanical or isolated electronic functions, then evolved toward digitized control, networked subsystems, and increasingly autonomous operation. In cars, this meant moving from engine control to chassis, infotainment, electrification, and ADAS; in aircraft, ships, and spacecraft, it meant a similar shift from stand-alone avionics or navigation aids to integrated, safety-critical digital architectures.

The chapter also emphasizes that autonomy is not just a matter of adding sensors. It requires a full ecosystem of hardware, computation, validation, and governance. Different domains rely on different sensor mixes—such as radar, cameras, LiDAR, GNSS, IMUs, sonar, or star trackers—but all must fuse data and convert it into safe decisions in real time. Because these systems are safety-critical, the chapter highlights the importance of standards such as ISO 26262, IEC 61508, and DO-254, along with validation processes that include calibration, timing analysis, scenario-based testing, simulation, and structured safety cases.

Finally, the chapter argues that successful autonomous systems depend on more than technical performance: they must also navigate EMI regulation, health and safety oversight, and resilient supply chains. The discussion covers FCC spectrum and emissions compliance, EMC testing, and the role of accredited labs, then moves into supply-chain challenges such as component scarcity, cybersecurity, certification burdens, ethical sourcing, and technology obsolescence. The main takeaway is that autonomous systems are not just advanced machines—they are complex, tightly integrated products whose success depends on coordinated progress in electronics, sensing, safety, validation, and supply chain management.

Industries and Companies:

Type Description Example Players (Companies)
Semiconductor Manufacturers (Logic & Compute) Design and manufacture digital logic devices (MCUs, MPUs, SoCs, AI accelerators) that execute perception, planning, and control workloads in autonomous systems. Intel, NVIDIA, Qualcomm, NXP Semiconductors
Analog & Mixed-Signal Semiconductor Providers Provide sensing interfaces, power management ICs, ADC/DACs, and signal conditioning required to convert physical signals into digital data. Texas Instruments, Analog Devices, Infineon Technologies
Power Semiconductor & Wide Bandgap Players Develop Si, SiC, and GaN devices for high-efficiency power conversion in EVs, aircraft electrification, marine propulsion, and space systems. Wolfspeed, onsemi, STMicroelectronics
Sensor Manufacturers (Perception Hardware) Build core sensing modalities (camera, radar, LiDAR, IMU, GNSS, sonar, star trackers) that define system observability and autonomy limits. Bosch, Continental AG, Velodyne LiDAR, Teledyne Technologies
RF & Communication Chip / Module Providers Provide connectivity hardware (5G, V2X, satellite comms, radar front-ends) enabling communication and extended perception. Skyworks Solutions, Qorvo, Broadcom
FPGA & Reconfigurable Compute Vendors Supply programmable logic for deterministic, safety-critical and adaptable processing in aerospace, defense, and space systems. AMD, Intel
EDA (Electronic Design Automation) Companies Provide design, simulation, verification, and sign-off tools spanning chip, package, and PCB levels—critical for hardware validation and production. Synopsys, Cadence Design Systems, Siemens
Foundries & Advanced Packaging Providers Fabricate semiconductors and provide advanced packaging technologies for high-performance and reliable systems. TSMC, Samsung Foundry, Intel Foundry Services
Vendor Platform / Kit Type Key Components Target Domain Notes / Differentiation
NVIDIA NVIDIA DRIVE (Orin / Thor) Full autonomy compute platform GPU SoC, Tensor cores, CUDA, DriveWorks SDK Automotive autonomy (L2–L4) End-to-end AV compute + software stack
NVIDIA Jetson Orin Dev Kit Embedded AI compute platform CPU + GPU SoC, camera interfaces Robotics, drones, edge AI Widely used for prototyping
Qualcomm Snapdragon Ride Automotive compute platform AI accelerator, vision DSP, sensor fusion Automotive ADAS/AV Strong power efficiency + integration
Intel Mobileye EyeQ / AV platform Vision-centric ADAS platform Vision SoC, camera-based perception software Automotive ADAS Camera-first autonomy strategy
AMD Versal Adaptive SoCs FPGA/ACAP compute platform FPGA fabric + AI engines Automotive, aerospace Deterministic + adaptive compute
Texas Instruments TDA4VM / Jacinto ADAS processor Vision DSP, radar processing, safety MCUs Automotive Strong functional safety (ISO 26262 focus)
NXP Semiconductors S32V / BlueBox Automotive compute + networking Vision SoC, radar processing, CAN/FlexRay Automotive Strong vehicle networking integration
Bosch Radar / ADAS platforms Sensor + ECU systems Radar, camera, ECU modules Automotive Tier-1 integrated sensor + compute solutions
Continental AG Continental ADAS Dev Platform Sensor fusion system Radar, LiDAR, camera modules Automotive Strong system-level integration
Velodyne LiDAR LiDAR Dev Kits (e.g., Puck) Sensor dev kits 3D LiDAR + SDK Autonomous, robotics High-resolution 3D perception
Ouster Ouster OS1 / Gemini LiDAR platform Digital LiDAR + API Robotics, industrial Software-defined LiDAR stack
Analog Devices Radar Development Kits RF sensing platform RF front-end + DSP Automotive, industrial Strong RF + signal chain expertise
Infineon Technologies AURIX + Radar Kits Safety MCU + radar Radar IC + safety MCU Automotive Leading safety MCU platform
STMicroelectronics STM32 + Sensor Kits Embedded sensing platform MCU + IMU, GNSS, camera Robotics, IoT Low-cost prototyping ecosystem
Teledyne Technologies Imaging Sensor Kits Vision sensing CMOS sensors, thermal imaging Aerospace, defense High-performance imaging
Sony CMOS Image Sensors Vision sensors High dynamic range sensors Automotive, consumer Dominant in camera sensing
Hexagon Autonomous Sensors Software + sensors LiDAR + mapping + analytics Industrial autonomy Strong digital twin ecosystem
dSPACE HIL (Hardware-in-the-Loop) systems Validation platform Sensor models, ECU integration Automotive, aerospace Critical for V&V workflows

Software Systems and Middleware

What is Software?

Programmable Hardware and the Emergence of Software Systems

The previous chapter introduced electronic hardware and the role of electronic components in implementing system functionality. However, the physical nature of hardware—and the inherent complexity of designing across mechanical, electrical, and logical domains—places fundamental limits on the speed and flexibility with which new system capabilities can be developed. To address these limitations, hardware platforms evolved to support programmability after fabrication. This programmability enables a separation between physical implementation and functional behavior, allowing systems to be adapted without redesigning the underlying hardware.

  1. Configuration: In many modern systems, hardware components can be configured after silicon fabrication to support multiple operating modes or product variants. For example, parameters such as bus widths, cache sizes, or feature sets may be selected through configuration registers or firmware-controlled settings.
  2. Hardware Function Realization: Certain hardware platforms support the post-silicon realization of hardware functionality through programmable logic structures. A canonical example is the Field Programmable Gate Array (FPGA), which enables designers to implement custom digital circuits after manufacturing. These devices are programmed using hardware description languages (HDLs), such as Verilog or VHDL, and have become foundational in embedded systems, prototyping, and specialized computing.
  3. Programmable Processors: A broad class of stored-program computing engines based on the von Neumann architecture, including microprocessors and microcontrollers, falls into this category. Historically programmed in assembly language, these devices are now predominantly programmed using high-level languages such as C, along with higher-level abstractions in more complex systems.

These programming paradigms introduce several important system-level considerations:

  1. Development Ecosystem: Programmability necessitates a supporting software development toolchain, including compilers, assemblers, linkers, debuggers. This development ecosystem becomes an integral part of the system and must be maintained, validated, and supported throughout the product lifecycle.
  2. Product Lifecycle: Historically, system programming was performed during manufacturing, resulting in a largely static, well-contained product. Post-deployment reprogramming was relatively rare, with notable exceptions in domains such as space systems. In contrast, modern systems increasingly rely on field updates and continuous software evolution, fundamentally altering lifecycle management.
  3. Peripherals and Interconnects: System flexibility was further enhanced through standardized hardware peripherals. These devices integrate mechanical, electrical, and computational functions and communicate via well-defined interconnect standards such as PCI and USB. This modularity enables extensibility and interoperability across systems.

The concept of programmable hardware was significantly advanced in the 1960s with the introduction of the IBM System/360, which formalized the notion of a stable computer architecture. This development marked a critical transition from device-specific design to platform-based computing and introduced several enduring properties:

  1. Abstraction and Compatibility: Computer architectures retained the fundamental von Neumann model—comprising multiple abstraction implementations. This abstraction enabled backward compatibility, allowing software developed for one generation of hardware to execute on future systems. As a result, performance improvements could be driven by advances in semiconductor processes and microarchitecture without requiring changes to application software.
  2. Operating Systems: The presence of a stable hardware abstraction enabled the development of higher-level system software such as operating systems, process isolation, scheduling, and resource management were formalized within operating systems. These systems provided a consistent execution environment and significantly improved programmability, portability, and system utilization.
  3. Networking: As computing systems proliferated, the need for communication between geographically distributed machines led to the development of networking. Layered abstractions—from physical transmission to application protocols—enabled reliable data exchange and ultimately supported the emergence of distributed systems and global connectivity.

Since the introduction of computer architectures in the 1960s, rapid advances in semiconductor technology, system design, and networking have driven an exponential expansion in computing capability. These developments have transformed nearly every aspect of modern society through what is broadly referred to as information technology. The programming of these systems—spanning configuration, control, and application logic—is collectively known as software.

Open-source systems have played a transformative role in the evolution of information technology by accelerating innovation, lowering barriers to entry, and standardizing software infrastructure across heterogeneous environments. Foundational platforms such as Linux, the Apache HTTP Server, and languages and ecosystems such as Python and GCC enabled a global, collaborative development model in which individuals, academia, and industry could contribute to shared software stacks. This model fostered rapid iteration, transparency, and portability, allowing software to scale from individual machines to cloud-scale distributed systems. Open-source licensing also enabled companies to build commercial products atop shared infrastructure, leading to the emergence of entire ecosystems around cloud computing, data analytics, and artificial intelligence. As a result, open-source software became a cornerstone of modern IT, underpinning everything from web services to high-performance computing and enabling a pace of innovation that would have been difficult to achieve through proprietary development alone.

History of Software and Cyber-Physical Systems

While the IT ecosystem drove massive innovations and built incredible capabilities, these capabilities could not be directly used in cyber-physical systems. Cyber-physical software differs from conventional embedded or enterprise software because it operates under strict real-time constraints and it needs robust fault tolerance and safety compliance. The historical introduction of software into cyber-physical systems followed different timelines across ground, airborne, marine, and space domains, but in all four cases the long-term trend was the same: software evolved from supporting narrow control functions to becoming the central coordinating layer for sensing, decision-making, communication, and actuation. In the earliest generation of these systems, most functionality was mechanical, hydraulic, analog, or electromechanical. As digital electronics matured, software first entered as a way to improve control precision, reduce weight, support diagnostics, and increase flexibility. Over time, however, software stopped being merely an enhancement and became essential to system operation. This shift was one of the major enablers of autonomy.

In ground systems, especially automobiles, software emerged in a practical production role during the 1970s and early 1980s, when tightening emissions regulations pushed manufacturers toward microprocessor-based engine control. Early automotive software was relatively narrow in scope, focused on ignition timing, fuel injection, and engine management. As electronics spread into anti-lock braking, traction control, airbags, steering, body electronics, and infotainment, software grew from embedded control logic into a distributed system running across many electronic control units. The later introduction of in-vehicle networks such as CAN and FlexRay further expanded software’s role, because control units now had to exchange data and coordinate across domains rather than operate as isolated devices. By the 2010s, with electrification and ADAS, software had become inseparable from perception, energy management, diagnostics, communications, and vehicle behavior.

In airborne systems, software entered earlier and under stricter safety expectations because avionics quickly became tied to navigation, stability, and flight control. Early aircraft electronics were largely analog and federated, but the move to digital control accelerated in the 1970s and 1980s, culminating in the rise of fly-by-wire systems. NASA notes that its F-8 Digital Fly-By-Wire aircraft became, on May 25, 1972, the first aircraft to fly completely dependent on an electronic flight-control system, marking a major turning point in the acceptance of software within the control loop. Later developments such as glass cockpits, FADEC, and integrated avionics made software central not only to control, but also to displays, redundancy management, fault monitoring, and mission systems. Because software was trusted with flight-critical functions so early, airborne systems also developed rigorous assurance frameworks earlier than most other sectors.

In marine systems, software was introduced more gradually and often first appeared as an aid to navigation, propulsion monitoring, and ship management rather than as the immediate core of vessel control. During the 1980s and 1990s, software became increasingly important through GPS integration, electronic charting, digital propulsion governors, alarm monitoring, and networking standards such as NMEA 0183 and NMEA 2000. As ships adopted Integrated Bridge Systems and Integrated Platform Management Systems, software took on a more integrative role, connecting radar, sonar, charting, safety alerts, and propulsion information into shared consoles and coordinated workflows. The marine sector generally moved more slowly than aerospace or automotive because of lower production volumes, long vessel lifecycles, and a historically stronger dependence on mechanical and human-operated systems. Still, the same underlying pattern emerged: software shifted from assisting operators to structuring the flow of information and control across the vessel.

In space systems, software became important very early because spacecraft had to function with limited or delayed human intervention. Even early missions required onboard digital logic for guidance, control, telemetry, and fault management. Apollo is a landmark example: NASA records describe the Apollo primary guidance, navigation, and control system as centered on the Apollo Guidance Computer, making software a mission-critical part of spacecraft operation during the 1960s lunar program. In later decades, spacecraft software expanded to support attitude control, payload operation, onboard data handling, autonomous fault detection, and increasingly software-defined mission behavior. Modern space systems add reconfigurable payloads, autonomous navigation, and onboard AI, but the historical pattern remains continuous: because space systems operate remotely and under extreme constraints, software has long been essential not just for convenience, but for basic mission survival and autonomy.

As software methods migrated from traditional computing into cyber-physical systems (CPS), a distinct class of software infrastructure emerged to manage the tight coupling between computation and the physical world. Central to this evolution was the adoption of real-time operating systems (RTOSes), which provide deterministic task scheduling, bounded interrupt latency, and predictable timing behavior—properties essential for interacting with sensors, actuators, and control loops. Unlike general-purpose operating systems, RTOSes are designed to guarantee that critical tasks execute within strict temporal constraints, often using priority-based preemptive scheduling and carefully managed resource sharing. Representative RTOS implementations include VxWorks, widely used in aerospace and defense systems; QNX, common in automotive and industrial platforms; and FreeRTOS, broadly adopted in embedded and IoT devices. In addition to RTOS kernels, CPS software stacks increasingly incorporated device drivers, middleware for communication (e.g., message queues and publish–subscribe frameworks such as DDS), and hardware abstraction layers (HALs) to isolate application logic from platform-specific details. These components enabled modular software architectures while preserving the determinism required for control and safety. Across domains such as ground, airborne, marine, and space systems, RTOS-based architectures became foundational to system design, with domain-specific adaptations. In ground systems, automotive platforms standardized software stacks such as AUTOSAR, where RTOS scheduling supports engine control units (ECUs), braking systems (ABS), and advanced driver assistance systems (ADAS). In airborne systems, avionics platforms such as the Boeing 787 rely on partitioned RTOS environments (often based on VxWorks) to meet stringent safety certification requirements (e.g., DO-178C), ensuring temporal and spatial isolation between flight-critical functions. In marine systems, integrated bridge and navigation systems—such as those used on modern commercial vessels and naval ships—employ real-time software (often QNX-based) to coordinate radar, GPS, and autopilot control loops under standards like IEC 61162 (NMEA). In space systems, spacecraft such as the Mars Perseverance Rover utilize RTOS platforms like VxWorks to manage guidance, navigation, and control in environments where remote operation and fault tolerance are essential. Over time, these systems evolved from tightly coupled, monolithic implementations to more layered and componentized architectures, incorporating standardized interfaces and increasingly sophisticated middleware. This progression laid the groundwork for modern trends such as software-defined vehicles, autonomous systems, and distributed CPS platforms, where software not only controls physical processes but also enables continuous updates, adaptability, and higher-level system intelligence.

In cyber-physical systems (CPS), the role of open-source software has been more gradual but increasingly significant, particularly as systems have become more complex, networked, and software-defined. Platforms such as FreeRTOS, Zephyr, and middleware frameworks like ROS have enabled broader access to embedded and robotic system development, fostering innovation in domains such as autonomous vehicles, industrial automation, and drones. Open-source approaches in CPS provide advantages in transparency, flexibility, and community-driven validation, which are particularly valuable for research and prototyping. However, their adoption in safety-critical domains—such as avionics, automotive safety systems, and space missions—has required careful integration with certification processes, long-term support models, and rigorous verification and validation practices. Increasingly, hybrid models are emerging in which open-source components form the foundation of development platforms, while certified, domain-specific layers ensure compliance with safety and reliability requirements, reflecting a convergence between the open innovation model of IT and the stringent assurance needs of cyber-physical systems.

Software and Safety Standards

As software moved from advisory and convenience roles into closed-loop control, fault management, and autonomy, safety standards had to shift from focusing mainly on hardware reliability to addressing software behavior, development process, traceability, and verification evidence. The big historical move was this: hardware could often be analyzed in terms of random failures and wear-out mechanisms, but software introduced a different kind of risk—systematic faults from requirements errors, design flaws, implementation mistakes, and unexpected interactions. That forced each domain to build standards that emphasized lifecycle rigor, requirements traceability, verification independence, configuration control, and structured safety arguments rather than just component robustness. IEC 61508 became the broad functional-safety reference point for programmable electronic systems and explicitly includes software requirements in Part 3, while later domain-specific standards adapted that logic to their own operating environments.

In ground systems, especially automotive, the early era of software safety was relatively informal: OEMs and suppliers used internal engineering discipline, testing, and FMEA-style thinking, but there was no unified framework tailored to vehicle software. As vehicles became software-intensive—first in engine control, then braking, steering, airbags, networking, and ADAS—the industry needed a standard that treated software as part of a full safety lifecycle. That came through ISO 26262, first published in 2011 as an adaptation of IEC 61508 for road vehicles. ISO 26262 introduced Automotive Safety Integrity Levels (ASILs), hazard analysis and risk assessment, lifecycle processes, and safety measures for both hardware and software, embedding software assurance into vehicle development rather than leaving it as a late-stage test problem. In practical terms, the standard pushed the automotive industry toward stronger requirements engineering, bidirectional traceability, safer software architecture, verification planning, and formal integration of software into system-level safety cases.

In airborne systems, software safety standards emerged earlier and with greater rigor because software entered flight-critical functions sooner. Aviation could not treat software as just another engineering layer once digital flight control, navigation, and avionics displays became mission- and safety-critical. That is why DO-178, originally published in 1981, became so influential: it defined design assurance for airborne software and tied development rigor to the criticality of the function. Over time this matured through DO-178B and then DO-178C in 2011, which remains the core software assurance framework recognized by the FAA through AC 20-115D. The airborne sector’s key historical move was to make software safety depend not on testing alone, but on documented objectives, lifecycle evidence, configuration control, structural coverage, tool qualification where needed, and verification commensurate with software level. In other words, aviation moved earliest and most clearly toward the idea that safe software is demonstrated through a disciplined assurance process, not just by showing that a program “seems to work.”

In marine systems, the evolution was slower and more fragmented. Marine governance historically focused more on mechanical integrity, redundancy, seaworthiness, and prescriptive equipment rules than on software-specific lifecycle assurance. As ships adopted integrated bridge systems, dynamic positioning, digital navigation, and autonomous functions, classification societies such as DNV, ABS, and Lloyd’s Register increasingly had to account for software quality, cyber resilience, and failure behavior in control systems. But unlike aviation and automotive, the marine sector did not converge as early on a single universally dominant software-safety standard. Instead, it has generally relied on a patchwork of class rules, IEC-derived functional-safety thinking, equipment standards, and system-specific assurance practices. So the historical movement in marine has been from equipment approval and redundancy rules toward a more software-aware model, but one that still remains less unified and less process-centered than in aerospace or automotive. That difference reflects the sector’s lower production volumes, varied vessel types, long lifecycles, and less centralized certification structure. The chapter you shared captures this well in noting that marine governance has remained more prescriptive and performance-based than process-assurance-based.

In space systems, software safety evolved under extreme mission-assurance constraints rather than through a single commercial certification pathway. Space programs recognized early that software errors could be catastrophic because repair is difficult or impossible, communication delays are long, and missions are expensive. For a long time, safety was handled through agency-specific reliability doctrine, redundancy, conservative design, and system engineering discipline rather than a single software certification standard like DO-178. NASA’s own software-safety framework became more explicit with NASA-STD-8719.13, first issued in 1997 and updated since; NASA describes it as specifying the activities necessary to ensure safety is designed into software acquired or developed by the agency. The space sector’s historical movement, then, has been from mission-specific reliability practice toward more formalized software-safety activities, documentation, and risk-scaled rigor. Compared with airborne systems, the emphasis is often less on certifying a product line for repeated operation and more on ensuring that mission-specific software hazards are identified, mitigated, and managed as part of a broader system safety case.

Software Supply Chain and Manufacturing

Software entered complex engineered products long before anyone talked about “software-defined” anything. In the earliest generations of electronic products, software was small, tightly coupled to a specific hardware function, and often treated almost like firmware: a fixed control layer burned into ROM or maintained by a small engineering team. Productization in that era was primarily a hardware discipline. Once the design was frozen and qualified, the software was expected to stay stable for years, sometimes for the entire product life. Maintainability existed, but mostly in the form of patching defects, issuing service updates, and preserving compatibility with replacement hardware. The supply chain focus was similarly physical: semiconductors, boards, connectors, and mechanical parts dominated risk and planning. Software dependencies were limited enough that organizations could often understand the full stack internally. That began to change as products became networked, feature-rich, and digitally updatable.

From the 1980s through the 2000s, software became a much larger share of product value, especially in embedded systems, telecommunications, aerospace, and automotive electronics. This changed productization from a one-time release activity into an ongoing lifecycle problem. A product now had to be launched, updated, serviced, secured, and sometimes reconfigured in the field. Maintainability became more than clean code or modular design; it came to mean version control across hardware variants, traceability from requirements to deployed binaries, long-term support for aging platforms, and the ability to diagnose failures across interacting subsystems. At the same time, the software supply chain became more complex. Instead of mostly internal code, products increasingly depended on third-party operating systems, middleware, protocol stacks, compilers, libraries, vendor SDKs, and eventually open-source components. NIST now describes the software supply chain as the collection of activities involved in producing and delivering software, noting that its integrity depends on the security and discipline of those activities; modern guidance emphasizes practices such as SBOMs, vendor risk assessment, vulnerability management, and secure development frameworks. Historically, that marks a major shift: software was no longer just something a company wrote, but something it assembled, integrated, inherited, and continuously governed.

The modern phase extends this logic even further. In connected products, especially vehicles, software is now a primary means of differentiation, feature delivery, and even business model evolution. That is where the idea of the software-defined vehicle (SDV) comes in. Historically, vehicles were built around many function-specific ECUs with tightly coupled hardware and software, and new capability typically arrived only with a new model year or hardware redesign. The SDV concept reflects a move away from that paradigm toward centralized or zonal computing, richer abstraction layers, and over-the-air updatability, so that features, performance, user experience, and even some platform behavior can evolve after the vehicle is sold. Industry analysts describe this shift as part of a broader transition in automotive E/E architecture, where software and centralized computing become the core enablers of innovation and ongoing value creation. From a historical perspective, the SDV is the endpoint of a long arc: products began as hardware with a little embedded code, became integrated systems whose success depended on software lifecycle management, and are now increasingly understood as updatable software platforms embodied in hardware.

Validation and Verification

IT-based software is verified through a structured combination of requirements-based testing, code analysis, and runtime validation, augmented by principles from Carnegie Mellon University Software Engineering Institute methodologies such as the Capability Maturity Model Integration and disciplined software engineering practices. Verification begins with ensuring that requirements are well-defined, traceable, and testable—aligned with CMMI’s emphasis on requirements management and validation. Development proceeds through unit, integration, and system testing, supported by peer reviews, formal inspections, and static analysis, reflecting SEI’s focus on early defect removal and process discipline. Measurement and analysis play a key role, with metrics collected to assess defect density, coverage, and process performance. Configuration management ensures that all artifacts (code, tests, requirements) are version-controlled and reproducible, while process maturity levels guide organizations toward increasingly predictable and optimized verification practices. Continuous integration pipelines automate regression testing, and in higher-maturity environments, quantitative process control and causal analysis are used to systematically improve quality. Finally, verification extends into operations through monitoring and feedback loops, embodying the SEI philosophy of continuous process improvement across the software lifecycle.

Validation of cyber-physical software places strong emphasis on hardware/software co-verification using a spectrum of simulation and emulation techniques to ensure correct behavior before deployment in the physical world. At the earliest stages, model-in-the-loop (MIL) and software-in-the-loop (SIL) simulations evaluate control algorithms and software logic against mathematical models of the environment and plant dynamics. These are followed by hardware-in-the-loop (HIL) approaches, where real control software executes on target or representative hardware while interacting with simulated sensors, actuators, and physical processes in real time—commonly used in automotive engine control, avionics flight systems, and industrial automation. As system complexity increases, processor-in-the-loop (PIL) and full-system emulation platforms enable timing-accurate execution and validation of embedded software under realistic workloads. In semiconductor and advanced embedded domains, platforms such as QEMU and commercial FPGA-based emulators allow early software bring-up prior to silicon availability. Across these stages, validation focuses not only on functional correctness but also on timing determinism, fault handling, and interaction with physical processes. This layered approach enables progressive risk reduction, bridging the gap between abstract models and real-world deployment while supporting the stringent safety and reliability requirements of cyber-physical systems.

In summary, a dominant IT electronic ecosystem drives the fundamental rhythm of hardware and software development. Cyber-physical systems, with considerably lower volume, have had to adapt to this dominant rhythm in the following ways:

  1. Hardware Obsolescence and Reliability: The IT ecosystem churns through product development at a pace of 18–24 months while cyber-physical systems have operational lifetimes beyond five years. This raises a requirement for very careful supply chain management for semiconductor components.
  2. Software Ecosystem: Operating systems, compilers, open-source software, communication standards, and middleware are continuously evolving cyber-physical ecosystems. This requires a dedicated architecture where safety-critical/real-time components can work alongside IT components (e.g., infotainment systems).
  3. Development Cost: Traditional models of fully encapsulated cyber-physical products (e.g., automobile platforms) are increasingly shifting to the IT release cycle with over-the-air updates.
  4. Cybersecurity: The introduction of communication systems as well as traditional IT software into cyber-physical systems has opened up an attack surface for bad actors.

Taken together, the shift from largely mechanical systems to software defined vehicles is a massive shift in design, manufacturing, support, and even legal ownership. Software is typically licensed to the OEM and then to the final customer.

Autonomy Software Stacks

Modern autonomous systems — from self-driving cars and unmanned aerial vehicles (UAVs) to marine robots and industrial co-bots — depend fundamentally on software architectures capable of real-time sensing, decision-making, and control. While mechanical and electronic components define what a system can do, the software stack defines how it does it — how it perceives the world, interprets data, plans actions, and interacts safely with its environment [66,67]. Autonomy software differs from conventional embedded or enterprise software in several critical ways:

This combination of safety-critical engineering and AI-driven decision-making makes autonomy software one of the most challenging areas in modern computing.

Core Functional Requirements of Autonomy Software

Autonomy software must achieve four key functional objectives [68,69]:

Each of these objectives corresponds to distinct software layers and modules in the autonomy stack.

Software Characteristics Unique to Autonomy

Characteristic Description Importance
Real-time Execution Must process sensor data and react within milliseconds. Ensures safety and stability.
Determinism Predictable behaviour under defined conditions. Required for validation and trust.
Scalability Supports increased sensor data and compute complexity. Allows future upgrades.
Interoperability Integrates diverse hardware, OS, and middleware. Facilitates modularity.
Resilience Must continue functioning despite partial failures. Critical for mission continuity.
Adaptability Learns from data or updates behaviour dynamically. Key for AI-driven autonomy.

These characteristics drive architectural decisions and the choice of frameworks (e.g., ROS, AUTOSAR Adaptive, DDS).

Autonomy Software as a Multi-Layered System

Autonomy software is layered, combining multiple software technologies:

The combination of these layers forms the autonomy software stack, which enables complex behaviour while maintaining reliability. A defining aspect of autonomy software is its reliance on middleware — frameworks that manage interprocess communication (IPC), data distribution, and time synchronisation across distributed computing nodes. Some of the widely used standards:

A complete software stack is a layered collection of software components, frameworks, and libraries that work together to deliver a complete set of system functionalities. Each layer provides services to the layer above it and depends on the layer below it. Middleware, which is an essential part of the multi-layered architectures, ensures that all layers of the software stack can exchange information deterministically and safely [70]. In autonomous systems, the software stack enables integration between:

It’s the backbone that allows autonomy to function as a cohesive system rather than a set of disconnected modules (Quigley et al., 2009; Maruyama et al., 2016). From a technical perspective, the software stack defines how functionality, data flow, and control are structured within the system.

Modularity and Abstraction

Each layer isolates complexity by providing a clean interface to the one above.

Real-Time and Deterministic Behaviour

Autonomous systems rely on real-time responses. The stack architecture ensures:

Interoperability

Middleware such as ROS 2 or DDS standardises interprocess communication. This allows different vendors’ software modules (e.g., LiDAR driver from Company A and planner from Company B) to work together.

Fault Tolerance and Redundancy

Stack layering supports redundant paths for safety-critical functions. If a perception node fails, a backup process may take over seamlessly — ensuring resilience, especially in aerospace and automotive systems [72].

Continuous Integration and Simulation

A layered design allows developers to:

Management and Organisational Importance

From a software engineering management perspective, a defined software stack provides structure and governance for the development process, which provides the following main advantages: Division of Labour. Teams can specialise by layer — e.g., one group handles perception, another control, another middleware. This parallelises development and allows use of domain expertise without interference.

Reusability and Version Control Reusable modules and APIs speed up development. Tools like Git, Docker, and CI/CD pipelines ensure traceability, maintainability, and fast updates across distributed teams.

Scalability and Lifecycle Management A well-structured stack can be extended with new sensors or algorithms without re-architecting the entire system. Lifecycle management tools (e.g., ROS 2 launch systems, AUTOSAR Adaptive manifests) maintain version consistency and dependency control.

Quality Assurance (QA) and Certification Layered software stacks make it easier to apply quality control and compliance frameworks, such as: ISO 26262 (Automotive safety software), DO-178C (Aerospace software) or IEC 61508 (Functional safety in automation). Each layer can be validated separately, simplifying documentation and certification workflows.

Cost and Risk Reduction When multiple projects share a unified software stack, the cost of testing, validation, and maintenance drops significantly. This approach underpins industry-wide initiatives like AUTOSAR, which standardises vehicle software to lower integration costs.

The Layered Stack as an Organisational Blueprint

In large autonomy projects (e.g., Waymo, Tesla), the software stack also serves as an organisational structure. Teams are aligned with layers:

Thus, the software stack doubles as both a technical architecture and an organisational map for coordination and accountability [73].

Real-World Example: ROS 2 as a Layered Stack

The Robot Operating System 2 (ROS 2) exemplifies how modular software stacks are implemented:

This layered model has become the foundation for numerous autonomous systems in academia and industry — from mobile robots to autonomous vehicles [74]).

Advantages of a Well-Defined Software Stack

Advantage Description
Clarity and Structure Simplifies system understanding and onboarding.
Parallel Development Enables multiple teams to work concurrently.
Interchangeability Supports component replacement without total redesign.
Scalability Allows future expansion with minimal rework.
Maintainability Facilitates debugging, upgrades, and certification.
Efficiency Reduces cost, redundancy, and integration risk.

In essence, a software stack is not merely a technical artefact — it’s a strategic enabler that aligns engineering processes, organisational structure, and long-term sustainability of autonomous platforms. Autonomy software stack and Development and Maintenance challenges are discussed in the following chapters.

Software Lifecycle and Typical Lifecycle Models

The software lifecycle defines the complete process by which software is conceived, developed, deployed, maintained, and eventually retired. In the context of modern engineering — particularly for complex systems such as autonomous platforms, embedded systems, or enterprise solutions — understanding the lifecycle is essential to ensure quality, reliability, and maintainability. The lifecycle acts as a roadmap that guides project teams through stages of development and management. Each stage defines specific deliverables, milestones, and feedback loops, ensuring that the software evolves in a controlled, traceable, and predictable way [8].

Definition

“The software lifecycle refers to a structured sequence of processes and activities required to develop, maintain, and retire a software system.” — [9] In other words, the lifecycle describes how a software product transitions from idea to obsolescence — incorporating all the engineering, management, and maintenance steps along the way. The lifecycle ensures:

In regulated domains like aerospace, automotive, and medical devices, adherence to a defined lifecycle is also a legal requirement for certification and compliance (e.g., ISO/IEC 12207, DO-178C, ISO 26262).

Typical Software Lifecycle Models

Different industries and projects adopt specific lifecycle models based on their goals, risk tolerance, and team structure. The most widely used models are explained in this chapter.

The Waterfall Model

The Waterfall Model is one of the earliest and most widely recognised software lifecycle models. It follows a linear sequence of stages where each phase must be completed before the next begins [10].

 The Waterfall Model
Figure 6: The Waterfall Model

Advantages:

Limitations:

The V-Model (Verification and Validation Model)

An evolution of the waterfall approach, the V-Model emphasises testing and validation at each development stage. Each “downward” step (development) has a corresponding “upward” step (testing/validation).

 V-Model Lifecycle
Figure 7: V-Model Lifecycle

Advantages:

Limitations:

The Iterative and Incremental Model

Instead of completing the whole system in one sequence, the iterative model develops the product through multiple cycles or increments. Each iteration delivers a working version that can be reviewed and refined. Advantages:

Limitations:

Agile Methodologies

Agile development (e.g., Scrum, Kanban, Extreme Programming) emphasises collaboration, adaptability, and customer feedback. It replaces rigid processes with iterative cycles known as sprints.

 Agile Lifecycle
Figure 8: Agile Lifecycle

Core Principles [11]:

Advantages:

Challenges:

The Spiral Model

Introduced by Boehm [12], the Spiral Model combines iterative development with risk analysis. Each loop of the spiral represents one phase of the process, with risk evaluation at its core.

 Spiral Lifecycle
Figure 9: Spiral Lifecycle

Advantages:

Limitations:

DevOps and Continuous Lifecycle

Modern systems increasingly adopt DevOps — integrating development, testing, deployment, and operations into a continuous cycle. This model leverages automation, CI/CD pipelines, and cloud-native

  DevOps Lifecycle
Figure 10: DevOps Lifecycle

Advantages:

Challenges:

Comparative Overview
Model Main Focus Advantages Best Suited For
Waterfall Sequential structure Simple, predictable Small or regulated projects
V-Model Verification and validation Traceable, certifiable Safety-critical systems
Iterative/Incremental Progressive refinement Flexible, early testing Complex evolving systems
Agile Collaboration & feedback Fast adaptation, user-centric Software startups, dynamic projects
Spiral Risk-driven development Risk control, scalability Large R&D projects
DevOps Continuous integration Automation, rapid delivery Cloud, AI, or autonomous platforms

Configuration Concepts and Challenges

In software engineering, Configuration Management (CM) refers to the systematic process of identifying, organising, controlling, and tracking all changes made to a software system throughout its lifecycle. It ensures that:

According to ISO/IEC/IEEE 828:2012, CM is defined as: “A discipline applying technical and administrative direction and surveillance to identify and document the functional and physical characteristics of a configuration item, control changes to those characteristics, and record and report change processing and implementation status.”

In other words, Configuration Management keeps the software stable while it evolves. Configuration management exists to:

 The Role of Configuration Management
Figure 11: The Role of Configuration Management (Adapted from [13] [14])

Key Concepts in Configuration Management

To understand CM, several foundational terms must be defined.

Configuration Item (CI) A Configuration Item is any component of the system that is subject to configuration control. Examples include:

Each CI is uniquely identified, versioned, and tracked over time [15].

Baseline A baseline is a formally approved version of one or more configuration items that serves as a reference point. Once established, any changes to the baseline must follow a defined change control process. Types of baselines:

Baselines create stability checkpoints in the lifecycle [16].

Version Control Version control systems (VCS), such as Git, Mercurial, or Subversion, track and manage modifications to source code and other files. They enable:

Version control forms the technical backbone of configuration management.

Change Management Change management defines how modifications are proposed, evaluated, approved, and implemented. Typical steps:

This structured approach ensures accountability and quality control [17].

Configuration Audit A configuration audit verifies that the configuration items and documentation:

Two common types:

Audits maintain integrity and compliance, especially in defence and aerospace projects [18].

Challenges in Configuration Management

Even though CM brings structure and order, it faces numerous practical challenges, particularly in distributed and complex systems.

Complexity and Scale Modern systems can contain millions of lines of code, hundreds of dependencies, and multiple configurations for different platforms. Managing all these variations manually is infeasible. Example: An autonomous vehicle might include distinct configurations for:

Solution: Automated configuration management with metadata-driven tools (e.g., Ansible, Puppet, Kubernetes Helm).

Multiple Development Streams In large projects, teams work on multiple branches or versions simultaneously (e.g., development, testing, release). This increases the risk of:

Solution:

Hardware–Software Interdependencies In embedded or cyber-physical systems, configurations depend on hardware variants (processors, sensors, memory). Maintaining alignment between software builds and hardware specifications is difficult. Mitigation:

Frequent Updates and Continuous Delivery In the DevOps era, software may be updated multiple times per day across thousands of devices. Each update must maintain consistency and rollback capability. Challenge:

Solution:

Data and Configuration Drift Configuration drift occurs when the system’s actual state deviates from its documented configuration — common in dynamic, cloud-based systems. Causes:

Prevention:

Regulatory and Compliance Demands In domains like aerospace, medical, and automotive, configuration management is a compliance requirement under standards such as ISO/IEC/IEEE 12207, ISO 26262 or IEC 61508 Challenge:

Solution:

Human and Organisational Factors The most difficult aspect of CM is often cultural, not technical. Teams may resist documentation or formal change control due to perceived bureaucracy. As a result:

Solution:

Main Steps and Tools of Configuration Management

Configuration management (CM) is not a single activity but a cyclic process integrated into the entire software lifecycle. The ISO/IEC/IEEE 828:2012 standard identifies four principal activities:

In modern practice, a fifth step — Configuration Verification and Review — is also added for continuous improvement and compliance.

 The Configuration Management Cycle
Figure 12: The Configuration Management Cycle (Adapted from [20] [21])

Configuration Identification The first step in CM defines what needs to be managed. It involves:

Example hierarchy:

 Example hierarchy
Figure 13: Example hierarchy

Tools & Techniques:

Goal: Create a clear inventory of every managed artefact and its dependencies.

 Change Control Workflow
Figure 14: Change Control Workflow

Tools and Techniques:

Goal: Ensure that every change is reviewed, justified, and properly recorded before being implemented.

Configuration Status Accounting (CSA) CSA provides visibility into the current state of configurations across the project. It records which versions of CIs exist, where they are stored, and what changes have occurred. Typical outputs include:

 Configuration Status Flow
Figure 15: Configuration Status Flow

Tools & Techniques:

Goal: Provide transparency and traceability, so project managers and auditors can reconstruct the exact configuration of any product version at any point in time.

Configuration Audit A Configuration Audit ensures the product conforms to its baseline and that all changes were properly implemented and documented. It verifies:

There are two types:

  1. Functional Configuration Audit (FCA): Confirms the system performs as intended.
  2. Physical Configuration Audit (PCA): Confirms that the physical implementation matches the design documentation.

Tools & Techniques:

Goal: Ensure integrity, consistency, and compliance across the entire configuration baseline.

Configuration Review and Verification This optional step closes the CM loop. It assesses whether CM processes are effective and aligned with project objectives. Activities include:

Tools:

Goal: Support continuous improvement and process optimisation.

Main Tools for Configuration Management

Modern CM relies heavily on automation and integration tools to manage complexity and enforce discipline across teams. These tools can be categorized by function.

Version Control Systems (VCS)

Tool Description Example Use
Git Distributed version control system; supports branching and merging. Used for nearly all modern software projects.
Subversion (SVN) Centralised version control with strict change policies. Preferred in regulated environments (aerospace, defence).
Mercurial Similar to Git, optimised for scalability and ease of use. Used in research or large repositories.

Build and Continuous Integration Tools

Tool Purpose Example Use
Jenkins / GitLab CI Automate building, testing, and deploying changes. Trigger builds after commits or merge requests.
Maven / Gradle / CMake Manage project dependencies and build processes. Ensure reproducible builds.
Docker / Podman Containerise environments for consistency. Package applications with dependencies for testing and deployment.

Infrastructure and Environment Management

Tool Function Application
Ansible / Puppet / Chef Automate configuration and provisioning. Keep server environments synchronised.
Terraform Infrastructure as Code (IaC) for cloud platforms. Manage cloud resources with version control.
Kubernetes Helm Manages container-based deployments. Controls configurations in microservice architectures.

Artifact and Release Management

Tool Purpose Example Use
JFrog Artifactory / Nexus Repository Store and version compiled binaries, libraries, and Docker images. Maintain reproducibility of releases.
Spinnaker / Argo CD Manage continuous deployment to production environments. Implement automated rollouts and rollbacks.

Configuration Tracking and Documentation

Tool Purpose Use Case
ServiceNow CMDB Tracks configuration items, dependencies, and incidents. Enterprise-scale CM.
Atlassian Confluence Maintains documentation and process records. Collaboration and change documentation.
Polarion / IBM DOORS Links requirements to configuration items and test results. Traceability in regulated environments.

Example – An integrated CM Workflow:

 An integrated  CM Workflow
Figure 16: An integrated CM Workflow (Adapted from GitLab, Atlassian, and IEEE 828 integration frameworks)

Toolchain Integration for Autonomous Systems In autonomous platforms (e.g., UAVs, vehicles), CM tools are often integrated with:

This hybrid approach ensures consistent software across all nodes — from cloud services to embedded controllers [23].

Common Pitfalls and Lessons Learned

Even mature organisations often encounter challenges in lifecycle and configuration management:

Pitfall Effect Mitigation
Poor version control discipline Loss of traceability Enforce the branching strategy and pull request reviews.
Incomplete configuration audits Undetected inconsistencies Automate audit workflows and compliance scanning.
Manual deployment processes Environment drift Use CI/CD and Infrastructure as Code.
Siloed documentation Lack of visibility Centralise records using CMDB or ALM platforms.
Lack of cultural adoption Resistance to process discipline Provide training, incentives, and leadership support.

Organisations that succeed in embedding CM practices view them not as bureaucracy, but as enablers of reliability and trust.

Autonomy Software Stack

A typical autonomy software stack is organised into hierarchical layers, each responsible for a specific subset of functions — from low-level sensor control to high-level decision-making and fleet coordination. Although implementations differ across domains (ground, aerial, marine), the core architectural logic remains similar:

This layered design aligns closely with both robotics frameworks (ROS 2) and automotive architectures (AUTOSAR Adaptive).

 The SCOR Model Typical Autonomy Software Stack
Figure 17: Typical Autonomy Software Stack (Adapted from [24] [25]

In Figure 1, the main software layers and their functions are depicted.

Hardware Abstraction Layer (HAL) The HAL provides standardised access to hardware resources. It translates hardware-specific details (e.g., sensor communication protocols, voltage levels) into software-accessible APIs. This functionality typically includes:

HAL ensures portability — software modules remain agnostic to specific hardware vendors or configurations [26].

Operating System (OS) and Virtualisation Layer The OS layer manages hardware resources, process scheduling, and interprocess communication (IPC) as well as real-time operation, alert and trigger raising using watchdog processes. Here, data processing parallelisation is one of the keys to ensuring resources for time-critical applications. Autonomous systems often use:

Time-Sensitive Networking (TSN) extensions and PREEMPT-RT patches ensure deterministic scheduling for mission-critical tasks [27].

Middleware / Communication Layer The middleware layer serves as the data backbone of the autonomy stack. It manages communication between distributed software modules, ensuring real-time, reliable, and scalable data flow. IN some of the mentioned architectures middleware is the central distinctive feature of the architecture. Popular middleware technologies:

Control & Execution Layer The control layer translates planned trajectories into actuator commands while maintaining vehicle stability. It closes the feedback loop between command and sensor response. Key modules:

Safety-critical systems often employ redundant controllers and monitor nodes to prevent hazardous conditions [28].

Autonomy Intelligence Layer This is the core of decision-making in the stack. It consists of several interrelated subsystems:

Subsystem Function Example Techniques / Tools
Perception Detect and classify objects, lanes, terrain, or obstacles. CNNs, LiDAR segmentation, sensor fusion.
Localization Estimate position relative to a global or local map. SLAM, GNSS, Visual Odometry, EKF.
Planning Compute feasible, safe paths or behaviours. A*, D*, RRT*, Behavior Trees.
Prediction Provide the environmental behaviour forecast. Usually, it provides an internal dynamics forecast as well. Recurrent Neural Networks, Bayesian inference.
Decision-making Choose actions based on mission goals and context. Finite State Machines, Reinforcement Learning.

These components interact through middleware and run either on edge computers (onboard) or cloud-assisted systems for extended processing [29].

Application & Cloud Layer At the top of the stack lies the application layer, which extends autonomy beyond individual vehicles:

Frameworks like AWS RoboMaker, NVIDIA DRIVE Sim, and Microsoft AirSim bridge onboard autonomy with cloud computation.

Data Flow in the Autonomy Software Stack

Autonomy systems rely on data pipelines that move information between layers in real time.

 Data Flow in an Autonomy Software Stack
Figure 18: Data Flow in an Autonomy Software Stack

Each stage includes feedback loops to ensure error correction and safety monitoring [30, 31].

Example Implementations

ROS 2-Based Stack (Research and Prototyping)

AUTOSAR Adaptive Platform (Automotive)

MOOS-IvP (Marine Autonomy)

Hybrid Cloud-Edge Architectures

Layer Interaction Example – Autonomous Vehicle

 Simplified Interaction Example
Figure 19: Simplified Interaction Example

This closed-loop data exchange ensures real-time responsiveness, robust error recovery, and cross-module coherence.

Development & Maintenance Challenges, Conclusions, and References

Developing and maintaining an autonomous software stack is a long-term, multidisciplinary endeavour. Unlike conventional software, autonomy stacks must handle:

These constraints make the software lifecycle for autonomy uniquely complex — spanning from initial research prototypes to industrial-grade, certified systems.

Main Development Challenges

Even with knowledge of autonomous software stacks, their development is still associated with significant and challenging problems. Through their mitigation and applications of different solutions, the autonomous systems become both expensive to design and develop as well as hard to maintain. The following are the most significant challenges.

Real-Time Performance and Determinism Autonomous systems require deterministic behaviour: decisions must be made within fixed, guaranteed time frames. However, high computational demands from AI algorithms often conflict with real-time guarantees [34]. Key Issues:

Timing mismatches across sensor and control loops. Mitigation:

Scalability and Software Complexity As systems evolve, the number of nodes, processes, and data streams grows exponentially. For instance, a modern L4 autonomous vehicle may contain >200 software nodes exchanging gigabytes of data per second. Problems:

Solutions:

Integration of AI and Classical Control AI-based perception and classical control must coexist smoothly. While AI modules (e.g., neural networks) handle high-dimensional perception, classical modules (e.g., PID, MPC) ensure predictable control. Challenge:

Best Practices:

Safety, Verification, and Certification Autonomous systems must conform to standards like the mentioned ISO 26262 (automotive functional safety), DO-178C (aerospace software certification) and IEC 61508 (industrial safety). Challenges:

Emerging Solutions:

Cybersecurity and Software Integrity Autonomous platforms are connected via V2X, cloud APIs, and OTA updates — creating multiple attack surfaces [37]. Risks:

Countermeasures:

Continuous Maintenance and Updates Unlike static embedded systems, autonomy software evolves continuously. Developers must maintain compatibility across versions, hardware platforms, and fleets already deployed in the field. Maintenance Practices:

 Continuous Integration and Maintenance Workflow
Figure 20: Continuous Integration and Maintenance Workflow (Adapted from [38,39]

Data Management and Scalability AI-driven autonomy relies on vast datasets for training, simulation, and validation. Managing, labelling, and securing this data is an ongoing challenge [40]. Issues:

Approaches:

Human–Machine Collaboration and Ethical Oversight Autonomy software doesn’t exist in isolation — it interacts with human operators, passengers, and society. Thus, software design must incorporate transparency, accountability, and explainability. Key Considerations:

Lifecycle of an Autonomy Software Stack

The software lifecycle typically follows a continuous evolution model:

Phase Purpose Typical Tools
Design and Simulation Define architecture, run models, and simulate missions. MATLAB/Simulink, Gazebo, CARLA, AirSim.
Implementation and Integration Develop and combine software modules. ROS 2, AUTOSAR, GitLab CI, Docker.
Testing and Validation Perform SIL/HIL and system-level tests. Jenkins, Digital Twins, ISO safety audits.
Deployment Distribute to field systems with OTA updates. Kubernetes, AWS Greengrass, Edge IoT.
Monitoring and Maintenance Collect telemetry and update models. Prometheus, Grafana, ROS diagnostics.

The goal is continuous evolution with stability, where systems can adapt without losing certification or reliability.

Open issues of validating AI components

A. AI COMPONENT VALIDATION Both the automotive and airborne spaces have reacted to AI by viewing it as “specialized Software” in standards such as ISO 8800 [14] and [13]. This approach has the great utility of leveraging all the past work in generic mechanically safety and past work in software validation. However, now, one must manage the issue of how to handle the fact that we have a data generated “code” vs conventional programming code. In the world of V&V, this difference is manifested in three significant aspects: coverage analysis, code reviews, and version control. TABLE III V&V Technique Software AI/ML Coverage Analysis: Code Structure provides basis of coverage No structure Code Reviews: Crowd source expert knowledge No Code to Review Version Control Careful construction/release Very Difficult with data

These differences generate an enormous issue for intelligent test generation and any argument for completeness. This is an area of active research, and two threads have emerged: 1) Training Set Validation: Since the final referenced component is very hard to analyze, one approach is to examine the training set and the ODD to find interesting tests which may expose the cracks between them [16]. 2) Robustness to Noise: Either through simulation or using formal methods [17], the approach is to assert various higher-level properties and use these to test the component. An example in object recognition might be to assert the property that an object should be recognized independent of orientation. Overall, developing robust methods for AI component validation is quite an active and unsolved research topic for “fixed” function AI components. That is, AI components where the function is changing with active version control. Of course, many AI applications prefer a model where the AI component is constantly morphing. Validating the morphing situation is a topic of future research.

B. AI SPECIFICATION

For well-defined systems with an availability of system level abstractions, AI/ML components significantly increase the difficulty of intelligent test generation. With a golden spec, one can follow a structured process to make significant progress in validation and even gate the AI results with conventional safeguards. Unfortunately, one of the most compelling uses of AI is to employ it in situations where the specification of the system is not well defined or not viable using conventional programming. In these Specification Less /ML (SLML) situations, not only is building interesting tests difficult, but evaluating the correctness of the results creates further difficulty. Further, most of the major systems (perception, location services, path planning, etc.) in autonomous vehicles fall into this category of system function and AI usage. To date, there have been two approaches to attack the lack of specification problem: Anti-Spec and AI-Driver. 1) Anti-Spec In these situations, the only approach left is to specify correctness through an anti-spec. The simplest anti-spec is to avoid accidents. Based on some initial work by Intel, there is a standard, IEEE 2846, “Assumptions for Models in Safety-Related Automated Vehicle Behavior” [18] which establishes a framework for defining a minimum set of assumptions regarding the reasonably foreseeable behaviors of other road users. For each scenario, it specifies assumptions about the kinematic properties of other road users, including their speed, acceleration, and possible maneuvers. Challenges include an argument for completeness, a specification for the machinery for checking against the standard, and the connection to a liability governance framework. 2) AI-Driver While IEEE 2846 comes from a bottom-up technology perspective, Koopman/Widen [19] have proposed the concept of defining an AI driver which must replicate all the competencies of a human driver in a complex, real-world environment. Key points of Koopman’s AI driver concept include:

a) Full Driving Capability: The AI driver must handle the entire driving task, including perception (sensing the environment), decision-making (planning and responding to scenarios), and control (executing physical movements like steering and braking). It must also account for nuances like social driving norms and unexpected events. b) Safety Assurance: Koopman stresses that AVs need rigorous safety standards, similar to those in industries like aviation. This includes identifying potential failures, managing risks, and ensuring safe operation even in the face of unforeseen events. c) Human Equivalence: The AI driver must meet or exceed the performance of a competent, human driver. This involves adhering to traffic laws, responding to edge cases (rare or unusual driving scenarios), and maintaining situational awareness at all times. d) Ethical and Legal Responsibility: An AI driver must operate within ethical and legal frameworks, including handling situations that involve moral decisions or liability concerns. e) Testing and Validation: Koopman emphasizes the importance of robust testing, simulation, and on-road trials to validate AI driver systems. This includes covering edge cases, long-tail risks, and ensuring that systems generalize across diverse driving conditions. Overall, it is a very ambitious endeavor and there are significant challenges to building this specification of a reasonable driver. First, the idea of a “reasonable” driver is not even well encoded on the human side. Rather, this definition of “reasonableness” is built over a long history of legal distillation, and of course, the human standard is built on the understanding of humans by other humans. Second, the complexity of such a standard would be very high and it is not clear if it is doable. Finally, it may take quite a while of legal distillation to reach some level of closure on a human like an “AI-Driver.” Currently, the state-of-art for specification is relatively poor for both ADAS and AV. ADAS systems, which are widely proliferated, have massive divergences in behavior and completeness. When a customer buys ADAS, it is not entirely clear what they are getting. Tests by industry groups such as AAA, consumer reports, and IIHS have shown the significant shortcomings of existing solutions [20]. In 2024, IIHS introduced a ratings program to evaluate the safeguards of partial driving automation systems. Out of 14 systems tested, only one received an acceptable rating, highlighting the need for improved measures to prevent misuse and ensure driver engagement [21]. Today, there is only one non process oriented regulation in the marketplace, and this is the NHTSA regulations around AEB [22].

Summary

This chapter traces the evolution of software from programmable hardware foundations to a dominant force in modern computing systems. Early advances in hardware programmability—through configuration, programmable logic (e.g., FPGAs), and stored-program processors—enabled a separation between physical implementation and functional behavior. The introduction of stable computer architectures (notably IBM System/360) and operating systems created enduring abstractions that allowed software portability, scalability, and rapid innovation. Over time, networking and open-source ecosystems further accelerated the growth of information technology, establishing software as the central driver of capability across computing platforms.

As software methods entered cyber-physical systems (CPS)—including ground, airborne, marine, and space domains—they followed a distinct trajectory shaped by real-time constraints, safety requirements, and physical interaction. Initially introduced to enhance control and diagnostics, software evolved into the core coordinating layer for sensing, decision-making, and actuation, enabling autonomy. This transition was supported by the emergence of real-time operating systems (RTOSes), middleware, and layered software architectures that ensured deterministic behavior and modularity. Across all domains, systems evolved from isolated, hardware-centric designs to distributed, software-intensive platforms, with increasing reliance on standardized frameworks and communication protocols.

The chapter further highlights how software has transformed product development, supply chains, and validation practices. Cyber-physical systems are increasingly influenced by the faster-moving IT ecosystem, adopting open-source components, layered stacks, and continuous update models (e.g., software-defined vehicles). At the same time, safety standards (e.g., ISO 26262, DO-178C) and rigorous verification methods—such as hardware/software co-simulation (MIL, SIL, HIL)—have evolved to address the risks of software-driven behavior. Modern software supply chains are complex, incorporating third-party and open-source dependencies, requiring strong configuration management, traceability, and cybersecurity practices. Overall, the chapter emphasizes a fundamental shift: engineered systems are no longer hardware products with embedded software, but increasingly software platforms embodied in hardware.

Stack Framework Type Core Covered Layers Key Technologies Domain Focus Notes / Differentiation
ROS 2 Open-source middleware stack Middleware, application DDS, nodes, topics, Gazebo, RViz Robotics, AV De facto R&D standard; highly modular
AUTOSAR Adaptive Automotive software platform OS, middleware, apps POSIX OS, SOME/IP, service-oriented Automotive (ADAS/AV) Designed for ISO 26262 + OTA updates
AUTOSAR Classic Platform Embedded real-time stack HAL, RTOS, basic software OSEK or RTOS, CAN, ECU abstraction Automotive ECUs Deterministic, safety-certified
Apollo Full autonomy stack Full stack (perception → control) Cyber RT, AI models, HD maps Autonomous driving (L2–L4) One of the most complete open AV stacks
Autoware Open AV stack Full autonomy pipeline ROS 2, perception, planning modules Automotive, robotics Strong academic + industry ecosystem
NVIDIA DRIVE OS Integrated platform OS, middleware, AI runtime CUDA, TensorRT, DriveWorks Automotive autonomy Tight HW/SW co-design with GPUs
QNX Neutrino RTOS middleware OS, safety layer POSIX RTOS, microkernel Automotive, industrial Strong certification (ASIL-D)
VxWorks RTOS OS, middleware Deterministic RTOS, ARINC653 Aerospace, defense Widely used in safety-critical systems
PX4 Autopilot UAV autonomy stack Control, middleware, perception MAVLink, EKF, control loops UAV / drones Industry standard for drones
ArduPilot UAV autonomy stack Control + navigation Mission planning, sensor fusion UAV, marine robotics Broad vehicle support (air/land/sea)
MOOS-IvP Marine autonomy stack Middleware Behavior-based robotics Marine robotics Optimized for low bandwidth environments
DDS (Data Distribution Service) Middleware standard Communication layer QoS messaging, pub-sub Cross-domain CPS Backbone of ROS 2 and many systems
AWS RoboMaker Cloud robotics stack Cloud, simulation DevOps, ROS integration Robotics, AV Enables CI/CD + simulation workflows
Microsoft AirSim Simulation stack Simulation layer Unreal Engine, physics models UAV, AV High-fidelity perception simulation
CARLA Simulation stack Simulation layer OpenDRIVE, sensors, physics Automotive Widely used for AV validation
Gazebo Simulation stack Simulation integration Physics engine, ROS integration Robotics Standard for ROS-based systems

Perception, Mapping and Localisation

The modern era of autonomy is often traced to the DARPA Grand Challenges (2004–2007), but it builds on decades of earlier automation across ground, marine, airborne, and space systems. In the airborne domain, autopilots date back to early 20th-century systems like Sperry Autopilot, evolving into today’s highly integrated flight management systems used on commercial aircraft such as the Boeing 777 and Airbus A320, where autopilot, autothrottle, and fly-by-wire systems routinely manage most phases of flight under human supervision. In the marine domain, ships have long used autopilots and dynamic positioning systems, while space systems—from the Apollo Guidance Computer to modern autonomous navigation on Mars rovers—demonstrated early closed-loop autonomy under extreme constraints. Ground systems, by contrast, lagged due to environmental complexity, which is why the DARPA challenges were so pivotal: the 2004 desert race exposed the immaturity of perception and planning, but by 2005 Stanford’s “Stanley” completed a 132-mile autonomous route, and the 2007 Urban Challenge introduced interaction with traffic, rules, and other agents. These competitions unified advances in sensing, probabilistic reasoning, and real-time control into full-stack autonomous systems and created the talent base that later drove commercial autonomy.

Previous to the DARPA challenge, deterministic algorithms were not able to make progress on important required aspects of building autonomous systems such as object recognition, path planning, or localization. The big recent leap in technology was the use of artificial intelligence to attack these previously intractable problems. The introduction of AI significantly moved field forward, but also introduced challenges.

This chapter introduces the perception, mapping, and localization in the context of autonomous vehicles and usage of different sensor modalities. It examines the determination of vehicle position, position and activities of other participants in the traffic, understanding of the surrounding scenes, scene mapping and map-keeping for navigation, applications of AI, and possible sources of uncertainty and instability.

Object Detection, Sensor Fusion, Mapping, and Positioning

Object Detection

Object detection is the fundamental perception function that allows an autonomous vehicle to identify and localize relevant entities in its surroundings. It converts raw sensor inputs into structured semantic and geometric information, forming the basis for higher-level tasks such as tracking, prediction, and planning. By maintaining awareness of all objects within its operational environment, the vehicle can make safe and contextually appropriate decisions.

Detected objects may include:

Each detection typically includes a semantic label, a spatial bounding box (2D or 3D), a confidence score, and sometimes velocity or orientation information. Accurate detection underpins all subsequent stages of autonomous behavior; any missed or false detection may lead to unsafe or inefficient decisions downstream.

Object detection relies on a combination of complementary sensors, each contributing distinct types of information and requiring specialized algorithms.

Camera-Based Detection

Cameras provide dense visual data with rich color and texture, essential for semantic understanding. Typical camera-based detection methods include:

Cameras are indispensable for interpreting traffic lights, signs, lane markings, and human gestures, but their performance can degrade under low illumination, glare, or adverse weather conditions.

LiDAR-Based Detection

LiDAR (Light Detection and Ranging) measures distances by timing laser pulse returns, producing dense 3D point clouds. LiDAR-based object detection methods focus on geometric reasoning:

LiDAR’s precise geometry enables accurate distance and shape estimation, but sparse returns or partial occlusions can challenge classification performance.

Radar-Based Detection

Radar (Radio Detection and Ranging) provides long-range distance and velocity information using radio waves. Its unique Doppler measurements are invaluable for tracking motion, even in fog, dust, or darkness. Typical radar-based detection techniques include:

Radar systems are especially important for early hazard detection and collision avoidance, as they function effectively through adverse weather and poor visibility.

Ultrasonic and Sonar-Based Detection

Ultrasonic and sonar sensors detect objects through acoustic wave reflections and are particularly useful in environments where optical or electromagnetic sensing is limited. They are integral not only to ground vehicles for close-range detection but also to surface and underwater autonomous vehicles for navigation, obstacle avoidance, and terrain mapping.

For ground vehicles, ultrasonic sensors operate at short ranges (typically below 5 meters) and are used for parking assistance, blind-spot detection, and proximity monitoring. Common methods include:

For surface and underwater autonomous vehicles, sonar systems extend these principles over much longer ranges and through acoustically dense media. Typical sonar-based detection methods include:

These acoustic systems are essential in domains where electromagnetic sensing (e.g., camera, LiDAR, radar) is unreliable — such as murky water, turbid environments, or beneath the ocean surface. Although sonar has lower spatial resolution than optical systems and is affected by multipath and scattering effects, it offers unmatched robustness in low-visibility conditions. As with other sensors, regular calibration, signal filtering, and environmental adaptation are necessary to maintain detection accuracy across varying salinity, temperature, and depth profiles.

Object detection outputs can be represented in different coordinate systems and abstraction levels:

Hybrid systems combine these paradigms—for example, camera-based semantic labeling enhanced with LiDAR-derived 3D geometry—to achieve both contextual awareness and metric accuracy.

Detection Pipeline and Data Flow

A standard object detection pipeline in an autonomous vehicle proceeds through the following stages:

  1. Data acquisition and preprocessing — raw sensor data are collected, filtered, timestamped, and synchronized.
  2. Feature extraction and representation — relevant geometric or visual cues are computed from each modality.
  3. Object hypothesis generation — candidate detections are proposed based on motion, clustering, or shape priors.
  4. Classification and refinement — hypotheses are validated, labeled, and refined based on fused sensory evidence.
  5. Post-processing and temporal association — duplicate detections are merged, and tracking ensures temporal consistency.

The pipeline operates continuously in real time (typically 10–30 Hz) with deterministic latency to meet safety and control requirements.

Sensor Fusion

No single sensor technology can capture all aspects of a complex driving scene under all circumstances, diverse weather, lighting, and traffic conditions. Therefore, data from multiple sensors is fused (combined) to obtain a more complete, accurate, and reliable understanding of the environment than any single sensor could provide alone.

Each sensor modality has distinct advantages and weaknesses:

By fusing these complementary data sources, the perception system can achieve redundancy, increased accuracy, and fault tolerance — key factors for functional safety (ISO 26262).

Sensor fusion can be focused on complementarity – different sensors contribute unique, non-overlapping information and redundancy – overlapping sensors confirm each other’s measurements, improving reliability. As multiple sensor modalities are used, both goals can be achieved.

Accurate fusion depends critically on spatial and temporal alignment among sensors.

Calibration errors lead to spatial inconsistencies that can degrade detection accuracy or cause false positives. Therefore, calibration is treated as part of the functional safety chain and is regularly verified in maintenance and validation routines.

Fusion can occur at different stages in the perception pipeline, commonly divided into three levels:

The mathematical basis of sensor fusion lies in probabilistic state estimation and Bayesian inference. Typical formulations represent the system state as a probability distribution updated by sensor measurements. Common techniques include:

Learning-Based Fusion Approaches

Deep learning has significantly advanced sensor fusion. Neural architectures learn optimal fusion weights and correlations automatically, often outperforming hand-designed algorithms. For example:

End-to-end fusion networks can jointly optimize detection, segmentation, and motion estimation tasks, enhancing both accuracy and robustness. However, deep fusion models require large multimodal datasets for training and careful validation to ensure generalization and interpretability.

AI-based Perception and Scene Understanding

Advances in AI, especially the convolutional neural network, allow us to process raw sensory information and recognize objects and categorize them into classes with higher levels of abstraction (pedestrians, cars, trees, etc.). Taking these categories into account allows autonomous vehicles to understand the scene and reason about future actions of the vehicle as well as about the other participants in road traffic and make assumptions on/predictions of their possible interactions. This section elaborates on the comparison of commonly used methods, their advantages, and weaknesses.

Traditional perception pipelines used hand-crafted algorithms for feature extraction and rule-based classification (e.g., edge detection, optical flow, color segmentation). While effective for controlled conditions, these systems failed to generalize to the vast variability of real-world driving — lighting changes, weather conditions, sensor noise, and unexpected objects.

The advent of deep learning revolutionized perception by enabling systems to learn features automatically from large datasets rather than relying on manually designed rules. Deep neural networks, trained on millions of labeled examples, can capture complex, nonlinear relationships between raw sensor inputs and semantic concepts such as vehicles, pedestrians, and traffic lights.

In an autonomous vehicle, AI-based perception performs several core tasks:

Deep Learning Architectures

Deep learning architectures form the computational backbone of AI-based perception systems in autonomous vehicles. They enable the extraction of complex spatial and temporal patterns directly from raw sensory data such as images, point clouds, and radar returns. Different neural network paradigms specialize in different types of data and tasks, yet modern perception stacks often combine several architectures into hybrid frameworks.

Convolutional Neural Networks (CNNs)

Convolutional Neural Networks are the most established class of models in computer vision. They process visual information through layers of convolutional filters that learn spatial hierarchies of features — from edges and corners to textures and object parts. CNNs are particularly effective for object detection, semantic segmentation, and image classification tasks. Prominent CNN-based architectures used in autonomous driving include:

3D Convolutional and Point-Based Networks

While cameras capture two-dimensional projections, LiDAR and radar sensors produce three-dimensional point clouds that require specialized processing. 3D convolutional networks, such as VoxelNet and SECOND, discretize space into voxels and apply convolutional filters to learn geometric features. Alternatively, point-based networks like PointNet and PointNet++ operate directly on raw point sets without voxelization, preserving fine geometric detail. These models are critical for estimating the shape and distance of objects in 3D space, especially under challenging lighting or weather conditions.

Transformer Architectures

Transformer networks, initially developed for natural language processing, have been adapted for vision and multimodal perception. They rely on self-attention mechanisms, which allow the model to capture long-range dependencies and contextual relationships between different parts of an image or between multiple sensors. In autonomous driving, transformers are used for feature fusion, bird’s-eye-view (BEV) mapping, and trajectory prediction. Notable examples include DETR (Detection Transformer), BEVFormer, and TransFusion, which unify information from cameras and LiDARs into a consistent spatial representation.

Recurrent and Temporal Models

Driving is inherently a dynamic process, requiring understanding of motion and temporal evolution. Recurrent Neural Networks (RNNs), particularly Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) models, are used to process sequences of observations and capture temporal dependencies. They are common in object tracking and motion prediction, where maintaining consistent identities and velocities of moving objects over time is essential. More recent architectures use temporal convolutional networks or transformers to achieve similar results with greater parallelism and stability.

Graph Neural Networks (GNNs)

Graph Neural Networks extend deep learning to relational data, representing scenes as graphs where nodes correspond to agents or landmarks and edges encode spatial or behavioral relationships. This structure makes GNNs well suited for modeling interactions among vehicles, pedestrians, and infrastructure elements. Models such as VectorNet, Trajectron++, and Scene Transformer use GNNs to learn dependencies between agents, supporting both scene understanding and trajectory forecasting.

Modern perception systems often combine multiple architectural families into unified frameworks. For instance, a CNN may extract image features, a point-based network may process LiDAR geometry, and a transformer may fuse both into a joint representation. These hierarchical and multimodal architectures enable robust perception across varied environments and sensor conditions, providing the high-level scene understanding required for safe autonomous behavior.

Data Requirements

The effectiveness of 'AI-based perception' systems depends fundamentally on the quality, diversity, and management of data used throughout their development lifecycle. As deep neural networks do not rely on explicit programming, but they learn to interpret the environment from large, annotated datasets, data becomes the foundation of reliable perception for autonomous vehicles.

Robust perception requires exposure to the full range of operating conditions that a vehicle may encounter. Datasets must include variations in:

A balanced dataset should capture both common and unusual situations to ensure that perception models generalize safely beyond the training distribution. Because collecting real-world data for every possible scenario is impractical and almost impossible, simulated or synthetic data are often used to supplement real-world datasets. Photorealistic simulators such as CARLA, LGSVL, or AirSim allow the generation of labeled sensor data under controlled conditions, including rare or hazardous events. Synthetic data helps to fill gaps in real-world coverage and supports transfer learning, though domain adaptation is often required to mitigate the so-called sim-to-real gap — differences between simulated and actual sensor distributions.

Annotation and Labeling

Supervised learning models rely on accurately annotated datasets, where each image, frame, or point cloud is labeled with semantic information such as object classes, bounding boxes, or segmentation masks. Annotation quality is critical: inconsistent or noisy labels can propagate systematic errors through the learning process. Modern annotation pipelines combine human labeling with automation — using pre-trained models, interactive tools, and active learning to accelerate the process. High-precision labeling is particularly demanding for LiDAR point clouds and multi-sensor fusion datasets, where 3D geometric consistency must be maintained across frames.

Ethical and Privacy Considerations

Data used in autonomous driving frequently includes imagery of people, vehicles, and property. To comply with privacy regulations and ethical standards, datasets must be anonymized by blurring faces and license plates, encrypting location data, and maintaining secure data storage. Fairness and inclusivity in dataset design are equally important to prevent bias across geographic regions or demographic contexts.

Scene Understanding

Scene understanding is a process by which an autonomous agent interprets its environment as a coherent model — integrating environment map, objects, semantics, and dynamics into a structured representation that supports decision-making. It is the bridge between raw perception and higher-level autonomy functions such as planning, prediction, and control.

The goal of scene understanding is to transform fragmented sensor detections into a meaningful, temporally consistent model of the surrounding scene.

Scene understanding often relies on multi-layered representations:

The relational layer captures how entities within a traffic scene interact with one another and with the static environment. While lower layers (geometric and semantic) describe what exists and where it is, the relational layer describes how elements relate — spatially, functionally, and behaviorally.

Spatial relation describes e.g. mutual distance, relative velocity, and possible collision of trajectories. Functional relations describe when one entity modifies, limits, or restricts functions of another, e.g., traffic lanes modify the movement of vehicles, railing restricts the movement of pedestrians, etc.

These relations can be explicitly represented by scene graphs, where nodes represent entities and edges represent relationships, or encoded in different types of neural networks, e.g., visual-language models.

Scene understanding must maintain temporal stability across frames. Flickering detections or inconsistent semantic labels can lead to unstable planning. Techniques include temporal smoothing, cross-frame data association to maintain consistent object identities, or memory networks that preserve contextual information across time.

The temporal part of the scene understanding is tightly coupled with motion prediction and forecasting future trajectories of all dynamic agents. Two primary approaches are Physics-based models (e.g., constant-velocity, bicycle models), which are simple and interpretable, but limited in complex interactions, and learning-based models, where data-driven networks capture contextual dependencies and multiple possible futures (e.g., MultiPath, Trajectron++, VectorNet).

Design Challenges

Designing autonomous systems which perform reliability has many design challenges. For the front-end of the AV pipeline discussed in this chapter, the challenges center around gracefully working across a range of operating conditions (ODD), performance characteristics of the sensors, and supply chain concerns.

Weather is a fundamental source of uncertainty for autonomous systems because it directly degrades sensor performance, but its impact varies significantly across ground, airborne, marine, and space domains. On the ground, rain, fog, snow, and dust can severely impair optical sensors (cameras, lidar) through scattering, attenuation, and occlusion, while also affecting radar through multipath and clutter—making perception and object classification the primary bottlenecks for autonomous vehicles. In airborne systems, weather effects such as icing, turbulence, and convective storms influence both sensing and vehicle dynamics; however, aviation benefits from structured sensing (e.g., radar, inertial systems, GPS) and well-developed weather-avoidance procedures, allowing autopilot systems to remain robust as long as hazardous regions are avoided. Marine systems face persistent challenges from sea spray, wave motion, and low-contrast environments, which degrade vision systems and introduce instability in sensor measurements, though radar and sonar provide complementary resilience. In space, traditional “weather” is absent, but analogous environmental effects—such as solar radiation, cosmic rays, and thermal extremes—impact sensor reliability and electronics, requiring radiation-hardened designs and redundancy. Across all domains, the key distinction is that weather (or its equivalent) not only reduces sensor fidelity but also increases uncertainty in state estimation and decision-making, making sensor fusion, redundancy, and probabilistic reasoning essential for maintaining safe autonomous operation.

Further, the use of electromagnetic (EM) energy in modern transportation corridors is increasing rapidly, driven by three major factors. First, the expansion of cellular networks to support continuous telecommunications for travelers has intensified ambient EM activity. Second, the widespread integration of active sensors—such as radar and LiDAR—within vehicles has introduced additional high-frequency sources. Third, infrastructure operators are deploying active sensing technologies in Roadside Units (RSUs) to enable vehicle-to-infrastructure (V2I) communication and monitoring. The resulting concentration of active EM sources is relatively well understood in the visual band with care taken for the design of highly reflective civil infrastructure as well as methods for night-time interference. However, this same care has not been done for all the sensor modalities. Especially for ground and airborne (air taxi corridors), active sensors create dense EM energy corridors which raise new challenges related to interference, coexistence, and safety which have not been characterized.

Beyond weather and EMI, sensor modalities must be complete enough to provide coverage under the constraints of the civil engineering infrastructure. Important aspects include the handling of curves, on/off ramps, bridges, tunnels, and more. For a designer there is a complex tradeoff between sensor type, number of sensors, and cost of sensors. For airborne, marine, and space systems, power and weight are also primary concerns.

Finally, because of the semiconductor business structure, cost and supply chain are intimately connected. The relationship between cost and volume in semiconductors is fundamentally shaped by high fixed costs and low marginal costs, creating powerful economies of scale. Semiconductor manufacturing requires enormous upfront investment in fabrication facilities (fabs), process development, and mask sets—often totaling billions of dollars—while the incremental cost of producing each additional chip (once the fab is running) is relatively low. As production volume increases, these fixed costs are amortized over a larger number of units, driving down the cost per chip. This dynamic is reinforced by learning curve effects (often described by Wright’s Law), where yield improvements, process optimizations, and defect reduction further reduce per-unit costs with cumulative volume. However, this relationship is not linear: advanced nodes (e.g., sub-5nm) introduce escalating mask and tooling costs that require extremely high volumes to be economically viable, while lower-volume or specialized chips (e.g., automotive, aerospace) often rely on mature nodes where costs are more stable but less aggressively optimized. As a result, the semiconductor industry exhibits a strong coupling between scale, technology node, and market demand, with leading-edge innovation economically justified primarily in high-volume applications such as consumer electronics and data center computing.

Advanced semiconductors can offer significant performance improvements in function, power, and cost. However, the economics of volume often determine whether the chip will be built. Today, the semiconductor cycle is dominated by consumer products. Automotive markets offer mid-tier volumes, and the other modalities (airborne, space, marine) are very low volume markets. The resulting design challenge is to either use advanced semiconductor chips from the consumer market, but with the limitations on safety. Alternatively, use lower-tier semiconductor chips but live with performance/power/cost/weight challenges.

Validation Approaches

Having designed a sensor, object recognition, and location services section, how does one test these components. The fundamentals are consistent with the discussions in chapter 2. One defines an ODD, builds tests underneath this ODD, applies these tests, and determines correctness. The application of the tests can virtual (simulation), physical (test track), or even a mix based on components (Hardware in Loop or Software in Loop). The population of tests needs to be complete enough to show sufficient coverage. The introduction of sensors and AI add significant complexity to this process.

Testing sensors in safety-critical systems is particularly challenging when viewed through the lens of verification, validation (V&V), and certification, because sensors are both hardware devices and context-dependent measurement systems. Verification—ensuring the sensor meets its design specifications—can be addressed through laboratory calibration, environmental stress testing, and compliance with standards such as ISO 16750 (environmental conditions), DO-160 (avionics), and MIL-STD-810 (defense systems). However, validation—ensuring the sensor performs adequately in real operational contexts—is far more complex. Sensor performance depends heavily on the operational design domain (ODD), including weather, lighting, clutter, and interference conditions, which are difficult to fully replicate or bound. This gap between controlled verification and real-world validation is especially acute for perception sensors (e.g., cameras, radar, lidar), where performance is probabilistic rather than deterministic and strongly influenced by environmental variability. Today, there is a great deal of innovation in mechanical test apparatus which mimic physical movement inside Anechoic Chambers to recreate difficult test scenarios. In the outdoor environment, hives of drones as EM sensors and producers of noise provide a similar function for test tracks.

Conventional Algorithm ML Algorithm Comment
Logical Theory No Theory In conventional algorithms, one needs a theory of operation to implement the solution. ML algorithms can often work without a clear understanding of exactly why they work.
Analytical Not Analytical Conventional algorithms are accurate in a way we can understand; however, ML algorithms are not easily understood and often behave like a “black box.”
Causal Correlation Conventional algorithms focus on causality, while ML algorithms discover correlations. The difference is important if one wants to reason at higher levels.
Deterministic Non-Deterministic Conventional algorithms are deterministic in nature, and ML algorithms are fundamentally probabilistic in nature.
Known Computational Complexity Unknown Computational Complexity Given the analyzable nature of conventional algorithms, one can build a model for computational complexity. This is not always possible for ML techniques, which may require testing to evaluate computational complexity.

Table 1: Contrast of Conventional and Machine Learning Algorithms

The introduction of AI as a replacement for traditional software introduces significant validation issues (table 1). Significantly, the many techniques developed for testing software such as code reviews, code coverage, and static analysis tools. Further, to test an AI component, it appears to be likely that one must test the method by which it was trained and have access to the training data.

Safety standards across automotive, marine, airborne, and space domains are now evolving to address the introduction of AI/ML-driven functionality, shifting from purely deterministic assurance models toward data-driven and probabilistic validation frameworks. In automotive, traditional functional safety under ISO 26262 has been extended by ISO/PAS 8800 and ISO 21448 to explicitly address perception uncertainty, training data coverage, and performance limitations of AI-based systems. In aviation, guidance such as DO-178C is being supplemented by emerging frameworks like DO-387 (in development) to tackle non-deterministic behavior, explainability, and learning assurance. Similarly, space systems governed by ECSS standards and marine systems guided by International Maritime Organization frameworks are beginning to incorporate autonomy and AI considerations, particularly for unmanned and remotely operated platforms. Across all domains, a common trend is emerging safety assurance is moving from static compliance toward lifecycle-based assurance, including dataset governance, simulation-based validation, runtime monitoring, and continuous certification concepts. This reflects a fundamental shift in safety engineering—from proving correctness of fixed logic to bounding the behavior of adaptive, data-driven systems operating under uncertainty.

This remainder of this section presents a practical, simulation-driven illustration to validating the perception, mapping (HD maps/digital twins), and localization layers of an autonomous driving stack. The core idea is to anchor tests in the operational design domain (ODD), express them as reproducible scenarios, and report metrics that connect module-level behavior to system-level safety.

Scope, ODD, and Assurance Frame

We decompose the stack into Perception (object detection/tracking), Mapping (HD map/digital twin creation and consistency), and Localization (GNSS/IMU and vision/LiDAR aiding) and validate each with targeted KPIs and fault injections. The evidence is organized into a safety case that explains how module results compose at system level. Tests are derived from the ODD and instantiated as logical/concrete scenarios (e.g., with a scenario language like Scenic) over the target environment. This gives you systematic coverage and reproducible edge-case generation while keeping hooks for standards-aligned arguments (e.g., ISO 26262/SOTIF) and formal analyses where appropriate.

Perception Validation Illustration

The objective is to quantify detection performance—and its safety impact—across the ODD. In end-to-end, high-fidelity (HF) simulation, we log both simulator ground truth and the stack’s detections, then compute per-class statistics as a function of distance and occlusion. Near-field errors are emphasized because they dominate braking and collision risk. Scenario sets should include partial occlusions, sudden obstacle appearances, vulnerable road users, and adverse weather/illumination, all realized over the site map so that failures can be replayed and compared.

 Detection Validation
Figure 21: Detection validation example. The Ground truth of the detectable vehicles is indicated using green boxes, while the detections are marked using red boxes.

Figure 1 explains object comparison. Green boxes are shown for objects captured by ground truth, while Red boxes are shown for objects detected by the AV stack. Threshold-based rules are designed to compare the objects. It is expected to provide specific indicators of detectable vehicles in different ranges for safety and danger areas.

Mapping / Digital-Twin Validation Illustration

Validation begins with how the map and digital twin are produced. Aerial imagery or LiDAR is collected with RTK geo-tagging and surveyed control points, then processed into dense point clouds and classified to separate roads, buildings, and vegetation. From there, you export OpenDRIVE (for lanes, traffic rules, and topology) and a 3D environment for HF simulation. The twin should be accurate enough that perception models do not overfit artifacts and localization algorithms can achieve lane-level continuity.

Key checks include lane topology fidelity versus survey, geo-consistency in centimeters, and semantic consistency (e.g., correct placement of occluders, signs, crosswalks). The scenarios used for perception and localization are bound to this twin so that results can be reproduced and shared across teams or vehicles. Over time, you add change-management: detect and quantify drifts when the real world changes (construction, foliage, signage) and re-validate affected scenarios.

Localization Validation Illustration

Here, the focus is on the robustness of ego-pose to sensor noise, outages, and map inconsistencies. In simulation, you inject GNSS multipath, IMU bias, packet dropouts, or short GNSS blackouts and watch how quickly the estimator diverges and re-converges. Similar tests perturb the map (e.g., small lane-mark misalignments) to examine estimator sensitivity to mapping error.

The following is a short KPI list:

 localization validation
Figure 22: Localization validation, in some cases, the difference between the expected location and the actual location may lead to accidents.

The current validation methods perform a one-to-one mapping between the expected and actual locations. As shown in Fig. 2, for each frame, the vehicle position deviation is computed and reported in the validation report. Later parameters, like min/max/mean deviations, are calculated from the same report. In the validation procedure, it is also possible to modify the simulator to embed a mechanism to add noise in the localization process to check the robustness and validate its performance.

Multi-Fidelity Workflow and Scenario-to-Track Bridge

A two-stage workflow balances coverage and realism. First, use LF tools (e.g., planner-in-the-loop with simplified sensors and traffic) to sweep large grids of logical scenarios and identify risky regions in parameter space (relative speed, initial gap, occlusion level). Then, promote the most informative concrete scenarios to HF simulation with photorealistic sensors for end-to-end validation of perception and localization interactions. Where appropriate, a small, curated set of scenarios is carried to closed-track trials. Success criteria are consistent across all stages, and post-run analyses attribute failures to perception, localization, prediction, or planning so fixes are targeted rather than generic.

Summary

The chapter develops a comprehensive view of perception, mapping, and localization as the foundation of autonomous systems, emphasizing how modern autonomy builds on both historical automation (e.g., autopilots across domains) and recent advances in AI. It explains how perception converts raw sensor data—across cameras, LiDAR, radar, and acoustic systems—into structured understanding through object detection, sensor fusion, and scene interpretation. A key theme is that no single sensor is sufficient; instead, robust autonomy depends on multi-modal sensor fusion, probabilistic estimation, and careful calibration to manage uncertainty. The chapter also highlights the transformative role of AI, particularly deep learning, in enabling scalable perception and scene understanding, while noting that these methods introduce new challenges related to data dependence, generalization, and interpretability.

A second major focus is on sources of instability and validation, where the chapter connects environmental effects (weather, electromagnetic interference), infrastructure constraints, and semiconductor economics to system-level performance. It underscores that validation must be grounded in the operational design domain (ODD) and cannot rely solely on physical testing, requiring a combination of simulation, hardware-in-the-loop, and scenario-based methods. The introduction of AI further complicates verification and validation because of its probabilistic, non-deterministic nature, challenging traditional assurance techniques. As a result, safety approaches across domains are evolving toward lifecycle-based assurance, incorporating data governance, simulation-driven testing, and continuous monitoring. The chapter concludes with a structured validation framework that links perception, mapping, and localization performance to system-level safety metrics, emphasizing reproducibility, coverage, and traceability in building a credible safety case.

Control, Planning, and Decision-Making

Autonomous systems across ground, airborne, marine, and space domains share common architectural layers—perception, decision-making, and control—but diverge significantly due to differences in dynamics, environmental uncertainty, and safety constraints. Ground systems (e.g., automotive and mobile robots) operate in highly structured yet cluttered environments with frequent interactions with humans and infrastructure. Their control algorithms emphasize real-time responsiveness, friction-limited dynamics, and precise low-speed maneuvering (e.g., PID/MPC controllers tuned for tire-road interaction). Decision-making often relies on rule-based systems augmented with probabilistic reasoning to handle traffic laws and agent interactions, while path planning combines graph-based methods (A*, D*) for global routing with sampling-based or optimization-based planners (RRT*, MPC) for local obstacle avoidance under tight latency constraints.

In contrast, airborne systems (e.g., UAVs, commercial aircraft) operate in a less cluttered but highly dynamic 3D environment with stricter stability and safety requirements. Control systems are typically layered with inner-loop stability augmentation (often linearized or gain-scheduled controllers) and outer-loop guidance laws. Decision-making must account for airspace regulations, weather, and energy constraints, often using hybrid systems and formal methods for safety assurance. Path planning extends into continuous 3D space with trajectory optimization under aerodynamic and kinematic constraints. Marine systems face slower dynamics but significant environmental disturbances (currents, waves, wind) and limited sensing fidelity. Their control approaches often emphasize robustness and disturbance rejection (e.g., adaptive or nonlinear control), while decision-making must handle sparse infrastructure and long-duration autonomy. Path planning may prioritize energy efficiency and waypoint-based navigation over reactive obstacle avoidance, except in congested waterways.

Space systems operate in the most extreme and least forgiving environment, where delays, limited actuation, and orbital mechanics dominate. Control algorithms are heavily model-based, often derived from first principles (e.g., astrodynamics), with limited opportunity for real-time correction. Decision-making is typically conservative and highly validated, with increasing use of onboard autonomy for deep-space missions where communication delays preclude human-in-the-loop control. Path planning is fundamentally different—focused on trajectory design using orbital transfers, optimization under gravitational constraints, and fuel minimization rather than obstacle avoidance. Across these domains, the progression from ground to space reflects a shift from reactive, data-driven approaches toward predictive, model-based, and highly verified methods, driven by increasing consequences of failure and decreasing opportunities for real-time human intervention.

Classical and AI-Based Control Strategies

Classical Control Strategies

Classical control strategies form the bedrock of modern vehicle control systems. These methods rely on mathematical models of the vehicle dynamics and well-established principles from control theory, primarily developed in the 20th century. Their strength lies in their mathematical rigor, transparency, and well-understood stability properties.

Principles and Common Techniques

Safety Aspects of Classical Control

Limitations

AI-Based Control Strategies

AI-based control strategies leverage machine learning and artificial intelligence techniques to learn control policies directly from data or simulations, often bypassing the need for explicit, hand-crafted mathematical models. This data-driven approach offers potential advantages in handling complexity and adaptability.

Principles and Common Techniques

Safety Aspects of AI-Based Control

Challenges and Safety Concerns

Integration and Hybrid Approaches

In practice, a purely classical or purely AI-based control system is rare. Instead, a hybrid approach is often employed, leveraging the strengths of both paradigms:

Safety Considerations and Future Directions

The choice between classical and AI-based control strategies, or a hybrid approach, has profound implications for the safety of autonomous vehicles.

Conclusion

Classical control strategies provide a foundation of predictability, stability, and transparency, making them essential for safety-critical low-level vehicle control. AI-based control strategies offer the potential to handle unprecedented complexity and adaptability, learning optimal behaviors from data. Neither approach is a silver bullet; each has distinct strengths and weaknesses regarding safety. The future of safe autonomous vehicle control likely lies in sophisticated hybrid systems that intelligently combine the rigor of classical control with the power of AI, all underpinned by rigorous verification, validation, and a relentless focus on ensuring robust and predictable behavior in the real world. The ongoing development and integration of these strategies are key to achieving the high levels of safety required for widespread deployment of autonomous vehicles.

Motion Planning and Behavioural Algorithms

While decision-making algorithms determine *what* high-level goal the autonomous vehicle should pursue (e.g., reach destination, avoid obstacle, follow lane), motion planning and behavioral algorithms translate these goals into specific, executable paths and maneuvers within the dynamic and complex environment. This sub-chapter delves into these critical components, exploring how they generate safe, efficient, and predictable trajectories and behaviors for the vehicle. The interplay between planning the path and deciding the behavior is fundamental to the safe operation of autonomous vehicles, requiring algorithms that can handle uncertainty, react to other road users, and comply with traffic rules.

Behavioral Algorithms: Deciding the "What" and "When"

Behavioral algorithms form the higher-level decision-making layer that interprets the vehicle's goals and the perceived environment to choose appropriate driving behaviors. They determine *what* the vehicle should do next and *when* to do it, such as deciding to change lanes, yield, accelerate, or stop.

Key Behavioral Concepts

Safety Aspects of Behavioral Algorithms

Challenges

Motion Planning: Deciding the "How" and "Where"

Once a behavioral decision is made (e.g., “change lane left”), the motion planner is responsible for generating a specific, feasible, and safe trajectory that executes this behavior. It answers the question of *how* to move from the current state to the desired state within the constraints of the environment and the vehicle itself.

Key Motion Planning Techniques

Safety Aspects of Motion Planning

Challenges

Integration and Interaction

Behavioral algorithms and motion planners are deeply intertwined and operate in a continuous loop:

  1. Perception: The vehicle senses its environment.
  2. Decision-Making/Behavioral Layer: Analyzes the environment and current goals to select a high-level behavior (e.g., “prepare for left lane change”).
  3. Motion Planning Layer: Takes the current state, the target behavior's goal state (e.g., position in the left lane), and the perceived environment to generate a feasible, safe, and smooth trajectory.
  4. Control Layer: Takes the generated trajectory (or reference points on it) and commands the vehicle's actuators (steering, throttle, brake) to follow it.
  5. Monitoring & Replanning: The system continuously monitors the execution, perception updates, and any deviations, potentially triggering replanning at either the behavioral or motion planning level.

This tight coupling is essential. The behavioral layer provides the “intent,” while the motion planner provides the “execution plan.” A failure or limitation in one layer can compromise the safety and effectiveness of the other. For example, an overly aggressive behavioral decision might lead the motion planner to generate an unsafe trajectory, while a motion planner that is too conservative might prevent the behavioral layer from making progress.

Safety Considerations and Future Directions

Ensuring the safety of the planning and behavioral components is paramount and presents unique challenges:

Conclusion

Motion planning and behavioral algorithms are the intelligent core that guides autonomous vehicles through the complexities of the real world. Behavioral algorithms decide the appropriate high-level actions based on goals and the environment, while motion planners generate the precise, safe, and feasible paths to execute those actions. Both face significant challenges related to complexity, uncertainty, computational demands, and safety assurance. The successful integration and continuous refinement of these algorithms, underpinned by rigorous testing and validation, are essential steps towards achieving the high levels of safety required for autonomous vehicles to operate reliably and deploy widely. Their evolution will continue to be a critical driver in the development of safe autonomous mobility.

Case Study and Safety Argumentation

On the TalTech iseAuto shuttle, the digital twin (vehicle model, sensor suite, and campus environment) is integrated with LGSVL/Autoware through a ROS bridge so that “photons-to-torque” loops are exercised under realistic scenes before any track test. Scenarios are distributed over the campus xodr network using Scenic/ M-SDL; multiple events can be chained within a scenario to probe planner behaviors around parked vehicles, slow movers, or oncoming traffic. Logging is aligned to the KPIs above so outcomes are comparable across LF/HF layers and re-runnable when planner or control parameters change. In practice, this has yielded a concise, defensible narrative for planning & control safety: (1) what was tested (formalized scenarios across a structured parameter space); (2) how it was tested (two-layer simulation with a calibrated digital twin and, when necessary, track execution); (3) what happened (mission success, DTC minima, TTC profiles, braking/steering transients, localization drift); and (4) why it matters (evidence that tuning or algorithmic changes move the decision–execution loop toward or away from safety). The same framework has been used to analyze adversarial stresses on rule-based local planners, reinforcing that planning validation must include robustness to distribution shifts and targeted perturbations. As a closing reflection, the approach acknowledges that simulation is not the world—so it measures the gap. By transporting formally generated cases to the track and comparing time-series behaviors, the program both validates planning/control logic and calibrates the digital twin itself, using discrepancies to guide model updates and ODD limits. That is the hallmark of modern control & planning V&V: scenario-driven, digitally twinned, formally grounded, and relentlessly comparative to reality.

Validation of Control & Planning

Principles and Scope

Planning and control are where intent becomes motion. A planning stack selects a feasible, safety-aware trajectory under evolving constraints; the control stack turns that trajectory into actuation while respecting vehicle dynamics and delays. Validating these layers is therefore about much more than unit tests: it is about demonstrating, with evidence, that the combined decision–execution loop behaves safely and predictably across the intended operational design domain (ODD). In practice, this requires two complementary ideas. First, a digital twin of the vehicle and environment that is accurate enough to make simulation a meaningful predictor of real behavior. Second, a design-of-experiments (DOE)–driven scenario program that stresses the decision and control logic where it matters most, and converts outcomes into monitorable, quantitative metrics. Your V&V suite frames both: scenario descriptions feed a co-running simulator with the under-test algorithms, the digital twin (vehicle and environment) is loaded as an external asset, and the outcome is a structured validation report rather than anecdotal test logs.

Planning/control V&V must also navigate the mix of deterministic dynamics and stochastic perception/prediction. At the component level, your framework treats detection, control, localization, mission planning, and low-level control as distinct abstractions, yet evaluates them in the context of Newtonian physics—explicitly trading fidelity for performance depending on the test intent. This modularity enables validating local properties (e.g., trajectory tracking) while still measuring system-level safety effects (e.g., minimum distance to collision).

A final principle is lifecycle realism. A digital twin is not just a CAD model; it is a live feedback loop receiving data from the physical system and its environment, so the simulator remains predictive as the product evolves. The same infrastructure that generates scenarios can replay field logs, inject updated vehicle parameters, and reflect map changes, enabling continuous V&V of planning and control post-deployment.

Example: Scenario-Based Validation with Digital Twins

The V&V workflow begins with a formal scenario description: functional narratives are encoded in a human-readable DSL (e.g., M-SDL/Scenic), then reduced to logical parameter ranges and finally to concrete instantiations selected by DOE. This ensures tests are reproducible, shareable, and traceable from high-level goals down to the numeric seeds that define a specific run. The simulator co-executes these scenarios with under the test algorithms inside the digital twin, and the V&V interface collects vehicle control signals, virtual sensor streams, and per-run metrics to generate the verdicts required by the safety case.

To maintain broad coverage without sacrificing realism, validations can be done using a two-layer approach shown in Figure 1. A low-fidelity (LF) layer (e.g., SUMO) sweeps wide parameter grids quickly to reveal where planning/control begins to stress safety constraints; a high-fidelity (HF) layer (e.g., a game engine simulator like CARLA with the control software in the loop) then replays the most informative cases with photorealistic sensors and closed-loop actuation. Both layers log the same KPIs, so results are comparable and can be promoted to track tests when warranted. This division of labor is central to scaling scenario space while maintaining end-to-end realism for planning and control behaviors like cut-in/out, overtaking, and lane changes.

Low and High Fidelity Simulators
Figure 23: Fidelity of AV simulation: a) Low-Fidelity SUMO simulator[41] b) High-Fidelity AWSIM simulator [42]

Formal methods strengthen this flow. In the simulation-to-track pipeline, scenarios and safety properties are specified formally (e.g., via Scenic and Metric Temporal Logic), falsification synthesizes challenging test cases, and a mapping executes those cases on a closed track[43]. In published evidence, a majority of unsafe simulated cases reproduced as unsafe on track, and safe cases mostly remained safe—while time-series comparisons (e.g., DTW, Skorokhod metrics) quantified the sim-to-real differences relevant to planning and control. This is exactly the kind of transferability and measurement discipline a planning/control safety argument needs.

Finally, environment twins are built from aerial photogrammetry and point-cloud processing (with RTK-supported georeferencing), yielding maps and 3D assets that match the real campus, so trajectory-level decisions (overtake, yield, return-to-lane) are evaluated against faithful road geometries and occlusion patterns[44].

Methods and Metrics for Planning & Control

Mission-level planning validation starts from a start–goal pair and asks whether the vehicle reaches the destination via a safe, policy-compliant trajectory. Your platform publishes three families of evidence: (i) trajectory-following error relative to the global path; (ii) safety outcomes such as collisions or violations of separation; and (iii) mission success (goal reached without violations). This couples path selection quality to execution fidelity.

At the local planning level, your case study focuses on the planner inside the autonomous software. The planner synthesizes a global and a local path, then evaluates them based on predictions from surrounding actors to select a safe local trajectory for maneuvers such as passing and lane changes. By parameterizing scenarios with variables such as the initial separation to the lead vehicle and the lead vehicle’s speed, you create a grid of concrete cases that stress the evaluator’s thresholds. The outcomes are categorized by meaningful labels—Success, Collision, Distance-to-Collision (DTC) violation, excessive deceleration, long pass without return, and timeout—so that planner tuning correlates directly with safety and comfort.

Trajectory Validation
Figure 24: Trajectory validation example

Control validation links perception-induced delays to braking and steering outcomes. Your framework computes Time-to-Collision (Formula) along with the simulator and AV-stack response times to detected obstacles. Sufficient response time allows a safe return to nominal headway; excessive delay predicts collision, sharp braking, or planner oscillations. By logging ground truth, perception outputs, CAN bus commands, and the resulting dynamics, the analysis separates sensing delays from controller latency, revealing where mitigation belongs (planner margins vs. control gains).

A necessary dependency is localization health. Your tests inject controlled GPS/IMU degradations and dropouts through simulator APIs, then compare expected vs. actual pose per frame to quantify drift. Because planning and control are sensitive to absolute and relative pose, this produces actionable thresholds for safe operation (e.g., maximum tolerated RMS deviation before reducing speed or restricting maneuvers).

Finally, your program extends to low-level control via HIL-style twins. A Simulink-based network of virtual ECUs and data buses sits between Autoware’s navigation outputs and simulator actuation. This lets you simulate bus traffic, counters, and checksums; disable subsystems (e.g., steering module) to provoke graceful degradation; and compare physical ECUs against their twin under identical inputs to detect divergence. It is an efficient route to validating actuator-path integrity without building a full physical rig.

Case Study and Safety Argumentation

On the TalTech iseAuto shuttle, the digital twin (vehicle model, sensor suite, and campus environment) is integrated with LGSVL/Autoware through a ROS bridge so that “photons-to-torque” loops are exercised under realistic scenes before any track test. Scenarios are distributed over the campus xodr network using Scenic/M-SDL; multiple events can be chained within a scenario to probe planner behaviors around parked vehicles, slow movers, or oncoming traffic. Logging is aligned to the KPIs above so outcomes are comparable across LF/HF layers and re-runnable when planner or control parameters change.

In practice, this has yielded a concise, defensible narrative for planning & control safety: (1) what was tested (formalized scenarios across a structured parameter space); (2) how it was tested (two-layer simulation with a calibrated digital twin and, when necessary, track execution); (3) what happened (mission success, DTC minima, TTC profiles, braking/steering transients, localization drift); and (4) why it matters (evidence that tuning or algorithmic changes move the decision–execution loop toward or away from safety). The same framework has been used to analyze adversarial stresses on rule-based local planners, reinforcing that planning validation must include robustness to distribution shifts and targeted perturbations.

As a closing reflection, the approach acknowledges that simulation is not the world—so it measures the gap. By transporting formally generated cases to the track and comparing time-series behaviors, the program both validates planning/control logic and calibrates the digital twin itself, using discrepancies to guide model updates and ODD limits. That is the hallmark of modern control & planning V&V: scenario-driven, digitally twinned, formally grounded, and relentlessly comparative to reality.

Simulation & Formal Methods

Why Simulation Needs Formalism

Simulation is indispensable in autonomous-vehicle validation because it lets us probe safety-critical behavior without exposing the public to risk, but simulation alone is only as persuasive as its predictive value. A simulator that cannot anticipate how the real system behaves—because of poor modeling, missing variability, or unmeasured assumptions—does not provide credible evidence for a safety case. This is why we pair simulation with formal methods: a discipline for specifying scenarios and safety properties with mathematical precision, generating test cases systematically, and measuring how closely simulated outcomes match track or road trials. In our program, the digital twin of the vehicle and its operating environment acts as the concrete “world model,” while formal specifications direct the exploration of that world to the places where safety margins are most likely to fail.

Treating the digital twin as a live feedback loop is central to maintaining predictive value over time. The twin ingests logs and environmental data from the physical shuttle, updates maps and vehicle parameters, and feeds those data back into the simulator so that new tests reflect actual wear, calibration drift, and environmental change. This continuous synchronization turns simulation into an ongoing assurance activity rather than a one-off milestone.

Building such twins is non-trivial. Our workflow constructs environment twins from aerial photogrammetry with RTK-supported georeferencing, then processes point clouds into assets capable of driving a modern simulator. The resulting model can be used across many AVs and studies, amortizing the cost of data collection and asset creation while preserving the fidelity needed for planning, perception, and control validation.

Digital twin and simulation ecosystems differ not only in fidelity and purpose across domains, but also in the toolchains and platforms that have emerged to support them. In ground systems (automotive, robotics), simulation is dominated by scalable, scenario-rich environments tightly coupled to AI/ML stacks. Widely used platforms include CARLA (open-source, Unreal Engine–based), NVIDIA DRIVE Sim (GPU-accelerated, synthetic data generation), PreScan and Simcenter (sensor-to-system validation), and MATLAB/Simulink for model-based design, SIL/HIL, and control validation. These platforms emphasize large-scale scenario generation, perception stack validation, and real-time or accelerated simulation with closed-loop autonomy.

In airborne systems, simulation platforms are more tightly aligned with certification workflows and high-fidelity physics. Common tools include X-Plane (used in research and some FAA-approved training contexts), Prepar3D, and engineering-grade environments such as ANSYS Fluent and MSC Adams for aerodynamics and flight dynamics. MATLAB/Simulink again plays a central role for flight control laws, avionics integration, and DO-178C/DO-331–aligned model-based development. These ecosystems support pilot-in-the-loop, avionics-in-the-loop, and increasingly autonomy-in-the-loop simulations with strong traceability.

For marine systems, simulation platforms reflect the importance of hydrodynamics, environmental disturbances, and long-duration operations. Representative tools include OrcaFlex (widely used for offshore structures and subsea systems), MOOS-IvP (common in autonomous underwater and surface vehicles), and Delft3D for simulating currents, sediment, and coastal processes. These are often coupled with control and navigation development in MATLAB/Simulink or ROS-based stacks. Compared to ground/air, marine simulations tend to trade interaction density for environmental realism and long-horizon mission modeling.

In space systems, simulation platforms are deeply rooted in astrodynamics, mission design, and high-fidelity subsystem modeling. Key tools include Systems Tool Kit (STK) for orbital analysis and mission planning, GMAT for trajectory optimization, and FreeFlyer. For system-level digital twins and MBSE integration, platforms such as Cameo Systems Modeler (SysML-based) and Simulink are widely used. These environments support mission rehearsal, fault analysis, and increasingly onboard autonomy validation, where simulation substitutes for otherwise impossible real-world testing. Across all four domains, a clear pattern emerges: ground systems favor scale and data-driven simulation, while space systems prioritize first-principles fidelity, with airborne and marine occupying structured intermediate points shaped by certification and environmental complexity.

From Scenarios to Properties: Making Requirements Executable

Formal methods begin by making requirements executable. We express test intent as a distribution over concrete scenes using the SCENIC language, which provides geometric and probabilistic constructs to describe traffic, occlusions, placements, and behaviors. A SCENIC program defines a scenario whose parameters are sampled to generate test cases; each case yields a simulation trace against which temporal properties—our safety requirements—are monitored. This tight loop, implemented with the VERIFAI toolkit, supports falsification (actively searching for violations), guided sampling, and clustering of outcomes for test selection.

In practice, the pipeline unfolds as follows. We first assemble the photorealistic simulated world and dynamics models from HD maps and 3D meshes. We then formalize scenarios in SCENIC and define safety properties as monitorable metrics—often using robust semantics of Metric Temporal Logic (MTL), which provide not just a pass/fail verdict but a quantitative margin to violation. VERIFAI searches the parameter space, records safe and error tables, and quantifies “how strongly” a property held or failed; these scores guide which cases deserve promotion to track tests. This process transforms vague test ideas (“test passing pedestrians”) into a concrete population of parameterized scenes with measurable, comparable outcomes.

Our project also leverages scenario distribution over maps: using OpenDRIVE networks of the TalTech campus, SCENIC instantiates the same behavioral narrative—say, overtaking a slow or stopped vehicle—at diverse locations, ensuring that lane geometry, curbside clutter, and occlusions vary meaningfully while the safety property remains constant. The result is a family of tests that stress the same planning and perception obligations under different geometric and environmental embeddings.

Selection, Execution, and Measuring the Sim-to-Real Gap

A formal pipeline is only convincing if simulated insights transfer to the track. After falsification, we select representative safe/unsafe cases through visualization or clustering of the safe/error tables and implement them on a closed course with controllable agents. Notably, the same SCENIC parameters (starting pose, start time, velocities) drive hardware actors on the track as drove agents in simulation, subject to physical limitations of the test equipment. This parity enables apples-to-apples comparisons between simulated and real traces.

We then quantify the sim-to-real gap using time-series metrics such as dynamic time warping and the Skorokhod distance to compare trajectories, first-detection times, and minimum-distance profiles. In published results, trajectories for the same test were qualitatively similar but showed measurable differences in separation minima and timing; moreover, even identical simulations can diverge when the autonomy stack is non-deterministic, a reality that the methodology surfaces rather than hides. Understanding this variance is a virtue: tests with lower variance are more reproducible on track, while highly variable tests reveal sensitivity in planning, perception, or prediction that merits redesign or tighter ODD limits.

This formal sim-to-track pipeline does more than label outcomes; it helps diagnose causes. By replaying logged runs through the autonomy stack’s visualization tools, we can attribute unsafe behavior to perception misses, unstable planning decisions, or mispredictions, and then target those subsystems in subsequent formal campaigns. In one case set, the dominant failure mode was oscillatory planning around a pedestrian, discovered and characterized through this exact loop of scenario specification, falsification, track execution, and trace analysis.

Multi-Fidelity Workflows and Continuous Assurance

Exhaustive testing is infeasible, so we combine multiple fidelity levels to balance breadth with realism. Low-fidelity (LF) platforms sweep large scenario grids quickly to map where safety margins begin to tighten; high-fidelity (HF) platforms (e.g., LGSVL/Unity integrated with Autoware) replay the most informative LF cases with photorealistic sensors and closed-loop control. Logging is harmonized so that KPIs and traces are comparable across levels, and optimization or tuning derived from LF sweeps is verified under HF realism before any track time is spent. In extensive experiments, thousands of LF runs revealed broad patterns, but only HF replays uncovered subtle interactions that flipped outcomes—evidence that fidelity matters exactly where the safety case will later be challenged.

This workflow sits within a DOE-driven V&V suite that treats the digital twin and scenario engine as programmable assets. Scenario definitions, vehicle models, and evaluation logic are versioned; control-loop delays, TTC profiles, and collision metrics are computed consistently per run; and the same infrastructure can be extended downward into hardware-in-the-loop experiments of low-level control paths to test actuator-path integrity under identical scene conditions. In our project platform, the simulator co-runs with Autoware, accepts parameterized scenarios through a public interface, and emits validation reports that roll up from frame-level signal checks to mission-level success, closing the traceability chain from formal property to system outcome.

Just as important as capability is honesty about limits. Our own survey and case study argue for explicit attention to abstraction choices, modeling assumptions, and convergence questions for AI-based components. The literature and our results stress that simulation’s value depends on calibrated models, careful measurement of non-determinism, and disciplined mapping to the real world; formal methods help precisely because they make these assumptions visible, testable, and comparable over time. The digital-twin perspective then turns those measurements into an engine for continuous improvement, updating the twin as the physical system and environment evolve.

Physical Testing

Physical testing infrastructures across ground, airborne, marine, and space systems reflect a progression from high-access, repeatable environments to extremely constrained, high-cost, and often non-replicable conditions. Each domain builds specialized facilities to bridge the gap between simulation and real-world deployment, with increasing emphasis on safety, controllability, and observability of complex system interactions.

Ground Systems (Automotive & Robotics)

Figure 25: AV test tracks

Ground systems benefit from the most accessible and diverse physical testing environments. Proving grounds and AV test tracks—such as Mcity and American Center for Mobility—replicate urban, suburban, and highway conditions with controllable variables (traffic signals, pedestrian dummies, weather systems). OEMs also use large private facilities (e.g., General Motors Milford Proving Ground) for durability, ADAS, and edge-case testing. These environments enable repeatable scenario testing, fault injection, and safe validation of perception and decision-making systems. Increasingly, they are instrumented with high-precision localization, V2X infrastructure, and synchronized data capture to support validation at scale.

Airbone Systems (Aviation & UAVs)

Figure 26: Airbone Systems (Aviation & UAVs)

Airborne testing combines ground-based facilities and open-air test ranges. Wind tunnels (e.g., NASA Ames Research Center Wind Tunnel) provide controlled aerodynamic testing across regimes, while iron-bird rigs and avionics labs enable hardware/software integration before flight. Actual flight testing occurs at restricted ranges such as Edwards Air Force Base or FAA-designated UAV corridors, where telemetry, radar tracking, and chase aircraft ensure safety. Compared to ground systems, repeatability is lower, and environmental factors (weather, airspace constraints) play a larger role, but the combination of lab + flight test provides a structured certification pathway.

Figure 27: Marine Systems (Surface & Underwater)

Marine testing relies on a mix of controlled hydrodynamic facilities and open-water trials. Towing tanks and wave basins—such as those at Naval Surface Warfare Center—allow precise study of hull performance, propulsion, and wave interaction. For autonomy, sheltered environments (harbors, test lakes) are used for early-stage validation, followed by coastal and deep-sea trials. Facilities often include instrumented buoys, GPS-denied navigation testing zones, and long-duration endurance setups. Compared to ground and air, marine systems emphasize disturbance realism (waves, currents) and long-horizon reliability, with less focus on dense, repeatable interaction scenarios.

Figure 28: Space Systems (Launch, Orbital, Deep Space

Space systems have the most specialized and constrained physical testing infrastructure. Because full end-to-end testing in the operational environment is impossible, engineers rely on high-fidelity ground facilities that replicate aspects of space conditions. These include thermal vacuum chambers (e.g., NASA Johnson Space Center Chamber A), vibration and acoustic test facilities for launch loads, and propulsion test stands (e.g., Stennis Space Center). RF anechoic chambers validate communication and sensing systems. While these facilities achieve extreme fidelity for specific physics, system-level validation is fragmented, requiring heavy reliance on simulation and incremental subsystem testing. The cost and irreversibility of failure drive a test philosophy centered on qualification, redundancy, and conservative margins.

Cross-Domain Insight

Across all four domains, physical testing evolves from highly repeatable, scenario-rich environments (ground) to physics-constrained, partial-reality validation (space). Airborne and marine systems sit in between, blending controlled facilities with real-world trials. A consistent trend is the integration of instrumented test environments with digital twins, enabling bidirectional feedback between physical experiments and simulation models—an increasingly critical capability for validating autonomous and safety-critical systems.

Summary

This chapter develops a comprehensive view of how control, decision-making, and motion planning form the core of autonomous system behavior, and how these elements vary across domains and implementation paradigms. It begins by contrasting classical control methods—such as PID, LQR, and state estimation—with AI-based approaches like reinforcement learning and neural network controllers. Classical methods offer strong guarantees in stability, transparency, and certifiability, making them well-suited for safety-critical low-level control. In contrast, AI-based methods provide adaptability and the ability to handle complex, nonlinear dynamics but introduce challenges in explainability, verification, and robustness. The chapter emphasizes that hybrid architectures—where AI handles high-level decisions and classical control ensures safe execution—are emerging as the most practical and safety-aligned approach.

The chapter then explores the decision and planning hierarchy, distinguishing between behavioral algorithms (“what to do”) and motion planning (“how to do it”). Behavioral methods such as finite state machines, behavior trees, and utility-based reasoning govern high-level actions like lane changes or yielding, while motion planners generate feasible trajectories using techniques like A*, RRT*, and model predictive control. A key insight is the tight coupling between these layers and the control system: perception feeds behavior, behavior drives planning, and planning feeds control in a continuous loop. Safety emerges not from any single layer, but from their coordinated operation under uncertainty, including prediction of other agents, adherence to constraints, and real-time replanning.

Finally, the chapter focuses on validation and assurance, highlighting the central role of digital twins, scenario-based testing, and formal methods. A modern V&V framework combines multi-fidelity simulation (low- and high-fidelity), design-of-experiments scenario generation, and formal specification of safety properties (e.g., using Scenic and temporal logic). These methods enable systematic exploration of edge cases, measurement of safety metrics (e.g., time-to-collision, trajectory error), and structured comparison between simulation and real-world testing. Physical testing—from AV tracks to space qualification facilities—complements simulation, while continuous feedback from deployed systems updates the digital twin. The overarching theme is that credible safety assurance requires a tightly integrated loop between simulation, formalism, and real-world validation, with explicit measurement of the sim-to-real gap.

Human-Machine Communication

Human–machine communication (HMC) is a critical safety and effectiveness layer across ground, aerospace, marine, and space systems, shaping how humans supervise, trust, and intervene in increasingly autonomous platforms. In aerospace, communication is highly structured and procedural, integrating pilots with automation through cockpit interfaces, alerts, and air traffic control, where clarity, workload management, and avoidance of mode confusion are paramount for safety. Marine systems emphasize long-duration situational awareness and often operate with reduced connectivity, requiring HMC that supports remote supervision, autonomy oversight, and coordination with human crews under uncertain environmental conditions. In space systems, communication is constrained by latency, limited bandwidth, and mission-critical stakes, driving the need for highly autonomous systems paired with carefully designed interfaces that allow operators to understand system state, diagnose anomalies, and issue high-level commands with confidence. However, ground systems face the greatest challenges in HMI.

Chapter two introduced the concept of safety and legal liability, and the key concept is expectation functions. That is, what is the expected behavior of the autonomous ground vehicle given a totally of the facts. Intimately connected to this concept is any communication between the autonomous vehicle and surrounding humans. This chapter focuses on how ground autonomous vehicles interact and communicate with people and their surrounding environment. As automation removes the human driver from the control loop, new forms of Human–Machine Communication (HMC) are required to ensure transparency, trust, and safety. The chapter examines how information is exchanged between vehicles, passengers, pedestrians, operators, and fleet managers through a variety of interfaces and communication modes. It introduces conceptual and practical frameworks such as Human–Machine Interfaces (HMI), the Language of Driving (LoD), and public acceptance mechanisms that together define how autonomy becomes understandable and socially integrated in everyday mobility.

Human–Machine Interface and Communication

This chapter explores the specificities of Human–Machine Interaction (HMI) in the context of autonomous vehicles (AVs). It examines how HMI in autonomous vehicles differs fundamentally from traditional car dashboards. With the human driver no longer actively involved in operating the vehicle, the challenge arises: how should AI-driven systems communicate effectively with passengers, pedestrians, and other road users?

HMI in AVs extends far beyond the driver’s dashboard. It defines the communication bridge between machines, people, and infrastructure — shaping how autonomy is perceived and trusted. Effective HMI determines whether automation is experienced as intelligent and reliable or opaque and alien.

Changing Paradigms of Communication

Traditional driver interfaces were designed to support manual control. In contrast, autonomous vehicles must communicate intent, status, and safety both inside and outside the vehicle. The absence of human drivers requires new communication models to ensure safe interaction among all participants.

This section addresses the available communication channels and discusses how these channels must be redefined to accommodate the new paradigm. Additionally, it considers how various environmental factors—including cultural, geographical, seasonal, and spatial elements—impact communication strategies.

A key concept in this transformation is the Language of Driving (LoD) — a framework for structuring and standardizing how autonomous vehicles express awareness and intent toward humans (Kalda et al., 2022).

Human Perception and Driving

Understanding how humans perceive the world is crucial for autonomous vehicles to communicate effectively. Human perception is multimodal — combining sight, sound, motion cues, and social awareness. By studying these perceptual mechanisms, AV designers can emulate intuitive human signals such as:

Such behaviorally inspired signaling helps AVs become socially legible, supporting shared understanding on the road.

Cultural and Social Interactions

Driving is a social act. Culture, norms, and environment shape how humans interpret signals and movements. Autonomous vehicles may need to adapt their communication style — from light colors and icons to audio tones and message phrasing — depending on cultural and regional expectations.

Research explores whether AVs could adopt human-like communication methods, such as digital facial expressions or humanoid gestures, to support more natural interactions in complex social driving contexts.

AI Role in Communication

Modern HMI systems increasingly rely on artificial intelligence, including large language models (LLMs), to process complex situational data and adapt communication in real time. AI enables:

The evolution toward AI-mediated interfaces marks a shift from fixed UI design toward conversational and contextual vehicle communication.

 Example of multimodal HMI used in TalTech autonomous shuttle research (source: Kalda et al., 2022).

Modes of Interactions

While the previous section described the foundations and goals of HMI, this section focuses on how autonomous vehicles communicate with various stakeholders and through which modes. These interactions can be categorized by user type, purpose, and proximity.

1. Passenger Communication

The vehicle–passenger interface supports comfort, awareness, and accessibility. It replaces the human driver’s social role by providing:

Passenger communication must balance automation with reassurance. In an Estonian field study (Kalda, Sell & Soe, 2021), over 90% of first-time AV users reported feeling safe and willing to ride again when the interface clearly explained the vehicle’s actions.

2. Pedestrian Communication

The vehicle–pedestrian interface (V2P) substitutes human cues such as eye contact or gestures. The *Language of Driving* (Kalda et al., 2022) proposes using standardized visual symbols, light bars, or projections to express intent:

Pedestrian communication must remain universal and intuitive, avoiding dependence on text or language comprehension.

3. Safety Operator and Teleoperation

At current autonomy levels (L3–L4), a safety operator interface remains essential. Two variants exist:

Teleoperation acts as a *bridge* between human oversight and full autonomy — essential for handling ambiguous traffic or emergency scenarios.

4. Maintenance and Diagnostics Interface

A dedicated maintenance interface enables technicians to safely inspect and update the vehicle:

Such interfaces ensure traceability, reliability, and compliance with safety regulations.

5. Fleet Manager Interface

Fleet-level interfaces provide centralized control and analytics for multiple vehicles. They support:

These tools operate mainly over remote communication channels, relying on secure data infrastructure.

6. Direct vs. Remote Communication

Autonomous vehicle interaction can be divided into direct (local) and remote (supervisory) communication:

Type Example Key Features
Direct (Local) Passenger, pedestrian, or on-site operator Low latency, physical proximity, immediate feedback.
Remote (Supervisory) Teleoperation or fleet control Network-based, high security, possible latency.
Service-Level (Asynchronous) Maintenance, updates, diagnostics Back-end communication; focuses on reliability and traceability.

7. Design Principles for Effective Communication

To ensure that human–machine communication is intuitive and safe, several universal design principles apply:

When applied systematically, these principles make autonomous systems understandable, predictable, and trustworthy.


References

Kalda, K.; Pizzagalli, S.-L.; Soe, R.-M.; Sell, R.; Bellone, M. (2022). *Language of Driving for Autonomous Vehicles.* Applied Sciences, 12(11), 5406. [https://doi.org/10.3390/app12115406](https://doi.org/10.3390/app12115406)

Kalda, K.; Sell, R.; Soe, R.-M. (2021). *Use Case of Autonomous Vehicle Shuttle and Passenger Acceptance.* Proceedings of the Estonian Academy of Sciences, 70(4), 429–435. [https://doi.org/10.3176/proc.2021.4.09](https://doi.org/10.3176/proc.2021.4.09)

Language of Driving Concepts

The Language of Driving (LoD) describes the implicit and explicit signals that allow autonomous vehicles and humans to understand each other in mixed traffic [1–3].

Semantics and Pragmatics of Driving

Driving behavior can be analyzed as a layered communication system:

An autonomous vehicle must infer human intent and simultaneously display legible intent of its own [2].

Cultural Adaptation and Universality

Driving “languages” vary globally; hence interfaces must maintain universal meaning while allowing local adaptation [1]. Behavior should be recognizable but not anthropomorphic, preserving clarity across cultures [3].

LoD Implementation Examples

Field experiments using light-based cues have shown that simple color and motion patterns effectively communicate awareness and yielding. Participants reported improved understanding when signals were consistent and redundant across modalities [2].

 Typical pedestrian crossing scenario using visual LoD cues.

Future Development

Formalizing LoD as a measurable framework is essential for verification, standardization, and interoperability of automated behavior [3].


References: [1] Razdan, R. et al. (2020). *Unsettled Topics Concerning Human and Autonomous Vehicle Interaction.* SAE EDGE Research Report EPR2020025. [2] Kalda, K., Sell, R., Soe, R.-M. (2021). *Use Case of Autonomous Vehicle Shuttle and Passenger Acceptance.* Proc. Estonian Academy of Sciences, 70 (4). [3] Kalda, K., Pizzagalli, S.-L., Soe, R.-M., Sell, R., Bellone, M. (2022). *Language of Driving for Autonomous Vehicles.* *Applied Sciences*, 12 (11).

Safety Concerns and Public Acceptance

The integration of autonomous vehicles (AVs) into everyday traffic introduces both technological and societal challenges. While automated driving systems aim to eliminate human error and improve efficiency, the perceived safety and acceptance of these systems remain crucial for their widespread adoption. Ensuring that people *trust* the technology is equally important as ensuring that the technology *functions safely*.

The Dual Nature of Safety

Safety in autonomous mobility can be divided into two interdependent aspects:

Even if an AV operates flawlessly according to standards and regulations, users may still hesitate to use it unless the system communicates its actions clearly and behaves predictably. Thus, *trust* emerges as a measurable component of safety.

Building Trust Through Transparency

Public acceptance is closely linked to how transparently the system communicates its intentions and limitations. People expect autonomous vehicles to behave in a consistent and understandable manner — signalling when yielding, stopping, or resuming motion. Clear visual or auditory cues from the vehicle’s human–machine interface (HMI) can substantially increase user confidence.

Equally important is transparent communication from operators and authorities regarding how safety is managed, what happens in case of system failures, and how data is used. Misinformation or uncertainty during incidents may quickly erode public trust even if no technical fault has occurred.

Experience as a Driver of Acceptance

Empirical research has shown that direct experience with AVs strongly increases trust. In one Estonian field study (Kalda, Sell & Soe, 2021), the majority of first-time users reported a high sense of safety and comfort, with over 90% indicating willingness to use autonomous shuttles again after their initial ride.

Such results confirm that personal experience and well-managed demonstrations are key factors in shaping public perception. People who interact directly with autonomous vehicles tend to transition from curiosity to trust, whereas those without exposure often remain cautious or skeptical. This highlights the importance of continuous testing, education, and public engagement.

Social, Ethical, and Communication Dimensions

Public acceptance extends beyond safety alone. It also encompasses questions of responsibility, fairness, accessibility, and societal impact. Autonomous transport must be inclusive and understandable to all citizens — regardless of age, digital literacy, or physical ability.

Ethical transparency, clear rules of accountability, and human-centered interface design all contribute to societal readiness for automation. Collaboration between engineers, psychologists, communication experts, and policy-makers is therefore essential to define a holistic framework of *social safety*.

Dimensions of Public Acceptance

Towards Responsible Deployment

Ensuring public confidence in autonomous mobility requires a balanced approach:

When these dimensions align, public acceptance evolves naturally, transforming initial curiosity and caution into trust and habitual use. The success of future autonomous mobility therefore depends not only on technological excellence but also on how well society understands and embraces it.


Reference: Kalda, K.; Sell, R.; Soe, R.-M. (2021). *Use Case of Autonomous Vehicle Shuttle and Passenger Acceptance.* Proceedings of the Estonian Academy of Sciences, 70(4), 429–435. [https://doi.org/10.3176/proc.2021.4.09](https://doi.org/10.3176/proc.2021.4.09)

Verification & Validation of HMI

Verification and Validation (V&V) of Human–Machine Interfaces (HMI) in autonomous vehicles ensure that communication between humans and intelligent systems is safe, intuitive, and consistent. While functional safety standards focus on the correct operation of sensors and control logic, HMI validation extends this to human comprehension, usability, and behavioral response [1–3].

Objectives of HMI Validation

The goal of HMI V&V is to confirm that:

The validation process therefore combines *technical testing* with *human-centered evaluation*.

Verification Methods

Verification addresses whether the interface behaves as intended. Typical methods include:

Verification ensures consistency, latency limits, and redundancy across modalities before any user testing is performed.

Human-in-the-Loop Evaluation

Validation focuses on how people actually experience and understand the interface. This involves iterative testing with human participants in controlled and real-world environments [1–3]. Approaches include:

Results are analyzed to refine signal patterns, color codes, and message phrasing to improve intuitiveness and reduce confusion.

Simulation and Virtual Prototyping

High-fidelity simulation environments enable early-stage evaluation of HMI without physical prototypes. Tools integrate virtual pedestrians, lighting, and weather to test how design choices influence visibility and legibility [3]. Virtual validation supports:

These techniques shorten development cycles and allow data-driven interface improvement.

Metrics and Performance Indicators

To make validation reproducible, quantitative metrics are defined, such as:

Standardized metrics enable benchmarking across projects and support regulatory assessment of AV communication readiness.

Towards Continuous Validation

HMI validation does not end with prototype testing. Field data from pilot deployments provide valuable feedback loops for ongoing improvement [2]. By combining simulation, real-world performance, and user analytics, HMI systems evolve continuously as technology and user expectations mature.

 Example of iterative HMI Verification and Validation process from concept to field testing.

References: [1] Razdan, R. et al. (2020). *Unsettled Topics Concerning Human and Autonomous Vehicle Interaction.* SAE EDGE Research Report EPR2020025.

[2] Kalda, K., Sell, R., Soe, R.-M. (2021). *Use Case of Autonomous Vehicle Shuttle and Passenger Acceptance.* Proc. Estonian Academy of Sciences, 70 (4).

[3] Kalda, K., Pizzagalli, S.-L., Soe, R.-M., Sell, R., Bellone, M. (2022). *Language of Driving for Autonomous Vehicles.* Applied Sciences, 12 (11).

Summary

Effective verification and validation bridge the gap between technical functionality and human understanding. By ensuring that communication is accurate, interpretable, and trusted, these processes contribute directly to the safe and responsible deployment of autonomous mobility [1–3].

Autonomy Validation Tools

Validation and verification (V&V) are critical processes in systems engineering and software development that ensure a system meets its intended purpose and functions reliably. Verification is the process of evaluating whether a product, service, or system complies with its specified requirements—essentially asking, “Did we build the system right?” It involves activities such as inspections, simulations, tests, and reviews throughout the development lifecycle. Validation, on the other hand, ensures that the final system fulfills its intended use in the real-world environment—answering the question, “Did we build the right system?” This typically includes user acceptance testing, field trials, and performance assessments under operational conditions. Together, V&V help reduce risks, improve safety and quality, and increase confidence that a system will operate effectively and as expected. In the context of autonomous systems, V&V combines two historical trends. The first from the mechanical systems and the second more recent one from classical digital decision systems. Finally, AI adds further complexity to the testing of the digital decision systems.

For traditional safety-critical systems in automotive, the evolution of V&V has been closely linked to regulatory standards frameworks such as ISO 26262. Key elements of this framework include:

  1. System Design Process: A structured development assurance approach for complex systems, incorporating safety certification within the integrated development process.
  2. Formalization: The formal definition of system operating conditions, functionalities, expected behaviors, risks, and hazards that must be mitigated.
  3. Lifecycle Management: The management of components, systems, and development processes throughout their lifecycle.

The primary objective was to meticulously and formally define the system design, anticipate expected behaviors and potential issues, and comprehend the impact over the product's lifespan.

With the advent of conventional software paradigms, safety-critical V&V adapted by preserving the original system design approach while integrating software as system components. These software components maintained the same overall structure of fault analysis, lifecycle management, and hazard analysis within system design. However, certain aspects required extension. For instance, in the airborne domain, standard DO-178C, which addresses “Software Considerations in Airborne Systems and Equipment Certification,” updated the concept of hazard from physical failure mechanisms to functional defects, acknowledging that software does not degrade due to physical processes. Also revised were lifecycle management concepts, reflecting traditional software development practices. Design Assurance Levels (DALs) were incorporated, allowing the integration of software components into system design, functional allocation, performance specification, and the V&V process, akin to SOTIF in the automotive industry.

Table one above shows the difference between ISO 26262 and SOTIF. In general, the fundamental characteristics of digital software systems are problematic in safety critical systems. However, the IT sector has been a key megatrend which has transformed the world over the last 50 years. In the process, it has developed large ecosystems around semiconductors, operating systems, communications, and application software. At this point, using these ecosystems is critical to nearly every product’s success, so mixed-domain safety critical products are now a reality. Mixed Domain structures can be classified in three broad paradigms each of which have very different V&V requirements: Mechanical Replacement (Big Physical, small Digital), Electronic Adjacent (separate Physical and Digital), autonomy (Big Digital, small Physical). Drive-by-Wire functionality is an example of the mechanical replacement paradigm where the implementation of the original mechanical functionality is done by electronic components (HW/SW). In their initial configurations, these mixed electronic/mechanical systems were physically separated as independent subsystems. In this configuration, the V&V process looked very similar to the traditional mechanical verification process.

The paradigm of separate physical subsystems has the advantage of V&V simplification and safety, but the large disadvantage of component skew and material cost. Thus, an emerging trend has been to build underlying computational fabrics with networking and virtually (through software) separate functionality. From a V&V perspective, this means that the virtual backbone which maintains this separation (ex: RTOS) must be verified to a very high standard. Infotainment systems are an example of Electronics Adjacent integration. Generally, there is an independent IT infrastructure working with the safety critical infrastructure, and from a V&V perspective, they can be validated separately. However, the presence of infotainment systems enables very powerful communication technologies (5G, Bluetooth, etc.) where the cyber-physical system can be impacted by external third parties. From a safety perspective, the simplest method for maintaining safety would be to physically separate these systems. However, this is not typically done because a connection is required to provide “over-the-air” updates to the device. Thus, the V&V capability must again verify the virtual safeguards against malicious intent are robust. Finally, the last level of integration is in the context of autonomy. In autonomy, the processes of sensing, perception, location services, path planning envelope the traditional mechanical functionality.

Moving beyond software, AI has built a “learning” paradigm. In this paradigm, there is a period of training where the AI machine “learns” from data to build its own rules, and in this case, learning is defined on top of traditional optimization algorithms which try to minimize some notion of error. This effectively is data driven software development as shown in figure below. However,there are profound differences between AI software and conventional software. The introduction of AI generated software introduces significant issues to the V&V task as shown in table 2 below.

Testing Infrastructure

As discussed earlier, generic V&V process consists of testing the product under test within the ODD. This is generally done with a number of techniques. The central paradigm is to generate a test, execute the test, and have a clear criteria for correctness. Three major styles of intelligent test generation are currently active: physical testing, real-world seeding, and virtual testing.

  1. Physical Testing :Typically, physical scaling is the most expensive method to verify functionality. However, Tesla has built a flow where their existing fleet is a large distributed testbed. Using this fleet, Tesla's approach to autonomous driving uses a sophisticated data pipeline and deep learning system designed to process vast amounts of sensor data efficiently. In this flow, the scenario under construction is the one driven by the driver, and the criterion for correctness is the driver's corrective action. Behind the scenes, the global verification flow can be managed by large databases and supercomputers (DoJo) . By employing this methodology, Tesla knows that its scenarios are always valid. However, there are challenges with this approach. First, the real world moves very slowly in terms of new unique situations. Second, by definition the scenarios seen are very much tied to the market presence of Tesla, so not predictive of new situations. Finally, the process of capturing data, discerning an error, and building corrective action is non-trivial. At the extreme, this process is akin to taking crash logs from broken computers, diagnosing them, and building the fixes.
  2. Real-World Seeding: Another line of test generation is to use physical situations as a seed for further virtual testing. Pegasus, the seminal project initiated in Germany, took such an approach. The project emphasized a scenario-based testing methodology which used observed data from real-world conditions as a base. Another similar effort comes from Warwick University with a focus on test environments, safety analysis, scenario-based testing, and safe AI. One of the contributions from Warwick is Safety Pool Scenario Database. Databases and seeding methods, especially of interesting situations, offer some value, but of course, their completeness is not clear. Further, databases of tests are very susceptible to be over optimized by AI algorithms.
  3. Virtual Testing: Another important contribution was ASAM OpenSCENARIO 2.0 which is a domain-specific language designed to enhance the development, testing, and validation of Advanced Driver-Assistance Systems (ADAS) and Automated Driving Systems (ADS). A high-level language allows for a symbolic higher level description of the scenario with an ability to grow in complexity by rules of composition. Underneath the symbolic apparatus are pseudo-random test generation which can scale the scenario generation process. The randomness also offers a chance to expose “unknown-unknown” errors.

Beyond component validation, there have been proposed solutions specifically for autonomous systems such as UL 4600, “Standard for Safety for the Evaluation of Autonomous Products.” Similar to ISO 26262/SOTIF, UL 4600 has a focus on safety risks across the full lifecycle of the product and introduces a structured “safety case” approach. The crux of this methodology is to document and justify how autonomous systems meet safety goals. It also emphasizes the importance of identifying and validating against a wide range of real-world scenarios, including edge cases and rare events. There is also a focus on including human-machine interactions.

What kind of testing infrastructure is required to execute on these various methodologies ?

The baseline for automotive physical testing are facilities for crash testing, road variations, and weather effects. These are generally in private and shared test tracks around the world. For autonomy, several levels of test infrastructure have emerged around the topics of sensors, test tracks, and virtual simulation.

Figure: Anechoic Chamber

For sensors, important equipment includes: - Anechoic Chambers: These chambers are characterized by their anechoic (echo-free) interior, meaning they are designed to completely absorb sound or electromagnetic waves to eliminate reflections from the walls, ceiling, and sometimes the floor. - Fully Anechoic Chambers (FAC): These chambers have all interior surfaces (walls, ceiling, and floor) covered with RF absorbing materials, creating an environment free from reflections. They are ideal for high-precision measurements like antenna testing or situations where a free-space environment is needed. - Semi-Anechoic Chambers (SAC): In this type, the walls and ceiling are covered with absorbing materials, while the floor remains reflective (often a metal ground plane). This reflective floor helps simulate real-world environments, such as devices operating on the ground. Semi-anechoic chambers are commonly used for general EMC (Electromagnetic Compatibility) testing. - RF Shielded Rooms (Faraday Cages): These are enclosed rooms designed to block the entry or exit of electromagnetic radiation. They are constructed with a conductive shield (typically copper or other metals) around the walls, ceiling, and floor, minimizing the entry or exit of electromagnetic interference (EMI). They are a fundamental component of many EMI testing facilities. - Reverberation Chambers: These chambers intentionally use resonances and reflections within the chamber to create a statistically uniform electromagnetic field. They can accommodate larger and more complex test setups and are particularly useful for immunity testing where the device is exposed to interference from all directions. However, their performance can be limited at lower frequencies.

Figure: Zalazone Autonomous Test Track

In terms of test tracks, traditional test tracks which were used for purposes for mechanical testing have been extended for testing autonomy functions. A recent example shown in the figure above is ZalaZONE, a large test track located in Hungary. ZalaZONE integrates both conventional vehicle testing infrastructure and next-generation smart mobility features. One of its standout components is the Smart City Zone, which simulates real-world urban environments with intersections, roundabouts, pedestrian crossings, and public transport scenarios. This allows for comprehensive testing of urban-level autonomy, V2X communication, and AI-driven mobility solutions in a controlled yet realistic environment. The facility includes a dedicated highway and rural road section to support the evaluation of higher-speed autonomous functions such as adaptive cruise control, lane-keeping, and safe overtaking. A high-speed oval enables long-duration endurance testing and consistent-speed trials for autonomous or connected vehicles. The dynamic platform provides a flat, open space for vehicle dynamics testing, such as automated emergency braking, evasive maneuvers, and trajectory planning, while both wet and dry handling courses allow for testing on varied friction surfaces under critical scenarios. ZalaZONE is also equipped with advanced V2X and 5G infrastructure, including roadside units (RSUs) and edge computing systems, to enable real-time communication and data exchange between vehicles and infrastructure—critical for cooperative driving and sensor validation. Additionally, an off-road section supports testing for SUVs, military vehicles, and trucks in rough terrain conditions. The facility is complemented by EMC testing capabilities and plans for climate-controlled testing chambers, enhancing its support for environmental and regulatory testing. ZalaZONE also provides integration with simulation and digital twin environments. Through platforms such as IPG CarMaker and AVL tools, developers can carry out software-in-the-loop (SIL) and hardware-in-the-loop (HIL) testing in parallel with on-track validation.

Figure: Carla Simulator

Finally, a great deal of simulation is done virtually. Simulation plays a critical role in the development and validation of autonomous vehicles (AVs), allowing developers to test perception, planning, and control systems in a wide range of scenarios without physical risk. Among the most prominent tools is CARLA, an open-source simulator built for academic and research use, known for its realistic urban environments, support for various sensors (LiDAR, radar, cameras), and integration with ROS. It’s widely adopted for prototyping and reinforcement learning in AVs. In the commercial space, “rFpro” is a leading choice for OEMs and Tier-1 suppliers, offering photorealistic environments and precise sensor modeling with sub-millimeter accuracy—essential for validating sensor fusion algorithms. Similarly, “IPG CarMaker” and “dSPACE ASM” provide powerful closed-loop environments ideal for testing vehicle dynamics and ADAS features, especially in hardware-in-the-loop (HIL) and software-in-the-loop (SIL) setups. These tools are tightly integrated with MATLAB/Simulink and real-time hardware for embedded control testing. For large-scale and safety-critical simulations, platforms like “VIRES VTD” and “Applied Intuition” are favored due to their compliance with industry standards like ASAM OpenX and ISO 26262, and their ability to model thousands of edge-case scenarios. “NVIDIA DRIVE Sim”, built on the Omniverse platform, is used to generate synthetic data for training and validating neural networks and digital twins, offering GPU-accelerated realism that aids perception system testing. Finally, simulators like “Cognata” and “MathWorks' Automated Driving Toolbox” serve niche but critical roles—Cognata provides city-scale environments for scenario testing and safety validation, while MathWorks' tools are widely used for algorithm development and control prototyping, especially in academia and early-stage design. Each simulator has a specific focus—some prioritize sensor realism, others full-system integration or large-scale scenario generation—so selection depends on whether the goal is research, real-time control testing, or safety validation for deployment.

Challenges Ahead

In terms of challenges, autonomy is very much in the early innings. Broadly speaking, the challenges can be split into three broad categories. First, the core technology elements within the autonomy pipeline (sensors, location services, perception, and path planning, the algorithms and methodology for demonstrating safety, and finally business economics.

Autonomous vehicles rely on a suite of sensors—such as LiDAR, radar, cameras, GPS, and ultrasonic devices—to perceive and interpret their surroundings. However, each of these sensor types faces inherent limitations, particularly in challenging environmental conditions. Cameras struggle with low light, glare, and weather interference like rain or fog, while LiDAR can suffer from backscatter in fog or snow. Radar, though more resilient in poor weather, provides lower spatial resolution, making it less effective for detailed object classification. These environmental vulnerabilities reduce the reliability of perception systems, especially in safety-critical scenarios. Another major challenge lies in the integration of multiple sensor types through sensor fusion. Achieving accurate, real-time fusion demands precise temporal synchronisation and spatial calibration, which can drift over time due to mechanical or thermal stresses. Furthermore, sensors are increasingly exposed to cybersecurity threats. GPS and LiDAR spoofing, or adversarial attacks on camera-based recognition systems, can introduce false data or mislead decision-making algorithms, necessitating robust countermeasures at both the hardware and software levels. Sensor systems also face difficulties with occlusion and semantic interpretation. Many sensors require line-of-sight to function properly, so their performance degrades in urban settings with visual obstructions like parked vehicles or construction. Even when objects are detected, understanding their intent—such as whether a pedestrian is about to cross the street—remains a challenge for machine learning models. Meanwhile, high-resolution sensors generate vast data streams, straining onboard processing and communication bandwidth, and creating trade-offs between resolution, latency, and energy efficiency. Lastly, practical concerns such as cost, size, and durability hinder mass adoption. LiDAR units, while highly effective, are often expensive and mechanically complex. Cameras and radar must also be ruggedised to withstand weather and vibration without degrading in performance. Compounding these issues is the lack of standardised validation methods to assess sensor reliability under varied real-world conditions, making it difficult for developers and regulators to establish trust and ensure safety across diverse operational domains.

The “perception system” is at the core of autonomous vehicle functionality, enabling the car to understand and interpret its surroundings in real time. It processes data from multiple sensors—cameras, LiDAR, radar, and ultrasonic devices—to detect, classify, and track objects. The perception system struggles with “semantic understanding and edge cases.” While object detection and classification have improved with deep learning, these models often fail in rare or unusual scenarios—like an overturned vehicle, a pedestrian in costume, or construction detours. Understanding the context and intent behind actions (e.g., whether a pedestrian is about to cross) is even harder. This lack of true situational awareness can lead to poor decision-making and is a key challenge for Level 4 and 5 autonomy. Also, the “computational burden” of real-time perception—especially with high-resolution inputs—creates constraints in terms of processing power, thermal management, and latency. Balancing model accuracy with speed and ensuring system performance across embedded platforms is a persistent engineering challenge.

Location services—often referred to as localisation—are essential to autonomous vehicles (AVs), enabling them to determine their precise position within a map or real-world environment. While traditional GPS offers basic positioning, autonomous vehicles require “centimetre-level accuracy,” robustness, and real-time responsiveness, all of which present significant challenges. One major challenge is the “limited accuracy and reliability of GNSS (Global Navigation Satellite Systems)” such as GPS, especially in urban canyons, tunnels, or areas with dense foliage. Buildings can block or reflect satellite signals, leading to multi-path errors or complete signal loss. While techniques like Real-Time Kinematic (RTK) correction and augmentation via ground stations improve accuracy, these solutions can be expensive, infrastructure-dependent, and still prone to failure in GNSS-denied environments. To compensate, AVs often combine GPS with “sensor-based localisation,” including LiDAR, cameras, and IMUs (inertial measurement units), which enable map-based and dead-reckoning approaches. Sensor-based dead reckoning using IMUs and odometry can help bridge short GNSS outages, but “drift accumulates over time,” and errors can compound, especially during sharp turns, vibrations, or tyre slippage. Finally, “map-based localisation” depends on the availability of high-definition (HD) maps that include detailed features like lane markings, curbs, and traffic signs. These maps are costly to build and maintain, and they can become outdated quickly due to road changes, construction, or temporary obstructions—leading to mislocalization.

Path planning in autonomous vehicles is a complex and safety-critical task that involves determining the vehicle's trajectory from its current position to a desired destination while avoiding obstacles, complying with traffic rules, and ensuring passenger comfort. One of the most significant challenges in this area is dealing with dynamic and unpredictable environments. The behaviour of other road users—such as pedestrians, cyclists, and human drivers—can be erratic, requiring the planner to continuously adapt in real time. Predicting these agents' intentions is inherently uncertain and often leads to either overly cautious or unsafe behaviour if misjudged. Real-time responsiveness is another major constraint. Path planning must be executed with low latency while factoring in a wide range of considerations, including traffic laws, road geometry, sensor data, and vehicle dynamics. This requires balancing optimality, safety, and computational efficiency within strict time limits. Additionally, the planner must account for the vehicle’s physical constraints, such as turning radius, acceleration, and braking limits, especially in complex manoeuvres like unprotected turns or obstacle avoidance. Another persistent challenge is operating with incomplete or noisy information. Sensor occlusion, poor weather, or localisation drift can obscure critical details such as road markings, traffic signs, or nearby objects. Planners must therefore make decisions under uncertainty, which adds complexity and risk. Moreover, the vehicle must navigate complex and often-changing road topologies—like roundabouts, construction zones, or temporary detours—where map data may be outdated or ambiguous. Finally, the need for continuous replanning introduces issues of robustness and comfort. The path planning system must frequently adjust trajectories to respond to new inputs, but abrupt changes can degrade ride quality or destabilise the vehicle. All of this must be done while maintaining rigorous safety guarantees, ensuring that every planned path can be verified as collision-free and legally compliant. Developing a system that meets these demands across diverse environments and edge cases remains one of the toughest challenges in achieving fully autonomous driving.

Algorithms and Methodology for Safety:

A major bottleneck remains the inability to fully validate AI behaviour, with a need for more rigorous methods to assess completeness, generate targeted test cases, and bound system behaviour. Advancements in explainable AI, digital twins, and formal methods are seen as promising paths forward. Additionally, current systems lack scalable abstraction hierarchies—hindering the ability to generalise component-level validation to system-level assurance. To build trust with users and regulators, the industry must also adopt a “progressive safety framework,” clearly showing continuous improvement, regression checks during over-the-air (OTA) updates, and lessons learned from real-world failures.

In terms of “V&V test apparatuses,” both virtual and physical tools are emphasised. Virtual environments will play a key role in supporting evolving V&V methodologies, necessitating ongoing work from standards bodies like ASAM. Physical test tracks must evolve to not only replicate real-world scenarios efficiently but also validate the accuracy of their virtual counterparts—envisioned through a “movie set” model that can quickly stage complex scenarios. Another emerging concern is “electromagnetic interference (EMI),” especially due to the widespread use of active sensors. Traditional static EMI testing methods are insufficient, and there is a need for dynamic, programmable EMI testing environments tailored to cyber-physical systems.

Finally, a rising concern is around cybersecurity in autonomous systems. These systems introduce systemic vulnerabilities that span from hardware to software, necessitating government-level oversight. Key sensor modalities like LiDAR, GPS, and radar are susceptible to spoofing, and detecting such threats is an urgent research priority. The V&V process itself must evolve to minimise exposure to adversarial attacks, effectively treating security as an intrinsic constraint within system validation, not an afterthought.

Business Models and Supply Chain:

Robo-taxis, or autonomous ride-hailing vehicles, represent a promising use case for autonomous vehicle (AV) technology, with the potential to transform urban mobility by offering on-demand, driverless transportation. Key use models include urban ride-hailing in city centres, first- and last-mile transit to connect riders with public transportation, airport and hotel shuttle services in geofenced areas, and mobility on closed campuses like universities or corporate parks. These models aim to increase vehicle utilization, reduce transportation costs, and offer greater convenience, particularly in environments where human-driver costs are a major factor. However, the business challenges are substantial. The development and deployment of robo-taxi fleets require enormous capital investment in hardware, software, testing, and infrastructure. Operational costs remain high, particularly in the early stages when human safety drivers, detailed maps, and limited deployment zones are still necessary. Regulatory uncertainty also hampers scalability, with different jurisdictions applying inconsistent safety, insurance, and operational standards. This makes expansion slow and costly.

In addition, consumer trust in autonomous systems remains fragile. High-profile incidents have raised safety concerns, and many riders may be hesitant to use driverless vehicles, especially in unfamiliar or emergency situations. Infrastructure constraints—such as poor road markings or limited connectivity—further limit the environments in which robo-taxis can operate reliably. Meanwhile, the path to profitability is challenged by competitive fare pricing, fleet maintenance logistics, and integration with broader transportation networks. Overall, while robo-taxis offer significant long-term promise, their success hinges on overcoming a complex mix of technological, regulatory, and business barriers.

The evolving economics of the semiconductor industry pose a significant challenge for low-volume markets, where custom chip development is often not cost-effective. As a result, autonomous and safety-critical systems must increasingly rely on Commercial Off-The-Shelf (COTS) components, making it essential to develop methodologies that can ensure security, reliability, and performance using these standardised parts. This shift places greater emphasis on designing systems that are resilient and adaptable, even without custom silicon. Additionally, traditional concerns like field maintainability, lifetime cost, and design-for-supply-chain practices—common in mechanical and industrial engineering—must now be applied to electronics and embedded systems. As electronic components dominate modern products, a more holistic design approach is needed to manage downstream supply chain implications. The trend toward software-defined vehicles reflects this need, promoting deeper integration between hardware and software suppliers. To further enhance supply chain resilience, there's a push to standardise around a smaller set of high-volume chips and embrace flexible, programmable hardware fabrics that integrate digital, analogue, and software elements. This architecture shift is key to mitigating supply disruptions and maintaining long-term system viability. Finally, “maintainability” also implies the availability of in-field repair facilities, which must be upgraded to handle autonomy.

Research Outlook

Autonomy is part of the next big megatrend in electronics which is likely to change society. As a new technology, there are a large number of open research problems. These problems can be classified in four broad categories: Autonomy hardware, Autonomy Software, Autonomy Ecosystem, and Autonomy Business models. In terms of hardware, autonomy consists of a mobility component (increasingly becoming electric), sensors, and computation.

Research in sensors for autonomy is rapidly evolving, with a strong focus on “sensor fusion, robustness, and intelligent perception.” One exciting area is “multi-modal sensor fusion,” where data from LiDAR, radar, cameras, and inertial sensors are combined using AI to improve perception in complex or degraded environments. Researchers are developing uncertainty-aware fusion models that not only integrate data but also quantify confidence levels, essential for safety-critical systems. There's also growing interest in “event-based cameras” and “adaptive LiDAR,” which offer low-latency or selective scanning capabilities for dynamic scenes, while self-supervised learning enables autonomous systems to extract semantic understanding from raw, unlabeled sensor data. Another critical thrust is the development of resilient and context-aware sensors. This includes sensors that function in all-weather conditions, such as “FMCW radar” and “polarization-based vision,” and systems that can detect and correct for sensor faults or spoofing in real-time. Researchers are also exploring “terrain-aware sensing,” “semantic mapping,” and “infrastructure-to-vehicle (I2V)” sensor networks to extend situational awareness beyond line-of-sight. Finally, sensor co-design—where hardware, placement, and algorithms are optimized together—is gaining traction, especially in “edge computing architectures” where real-time processing and low power are crucial. These advances support autonomy not just in cars, but also in drones, underwater vehicles, and robotic systems operating in unstructured or GPS-denied environments.

In terms of computation, exciting research focuses on enabling real-time decision-making in environments where cloud connectivity is limited, latency is critical, and power is constrained. One prominent area is the “co-design of perception and control algorithms with edge hardware,” such as integrating neural network compression, quantization, and pruning techniques to run advanced AI models on embedded systems (e.g., NVIDIA Jetson, Qualcomm RB5, or custom ASICs). Research also targets “dynamic workload scheduling,” where sensor processing, localization, and planning are intelligently distributed across CPUs, GPUs, and dedicated accelerators based on latency and energy constraints. Another major focus is on “adaptive, context-aware computing,” where the system dynamically changes its computational load or sensing fidelity based on situational awareness—for instance, increasing compute resources during complex maneuvers or reducing them during idle cruising. Related to this is “event-driven computing” and “neuromorphic architectures” that mimic biological efficiency to reduce energy use in perception tasks. Researchers are also exploring “secure edge execution,” such as trusted computing environments and runtime monitoring to ensure deterministic behavior under adversarial conditions. Finally, “collaborative edge networks,” where multiple autonomous agents (vehicles, drones, or infrastructure nodes) share compute and data at the edge in real time, open new frontiers in swarm autonomy and decentralized intelligence.

Finally, as there is a shift towards “software defined vehicles,” there is an increasing need to develop computing hardware architectures bottom-up with critical properties of software reuse and underlying hardware innovation. This process mimics computer architectures in information technology, but does not exist in the world of autonomy today.

In terms of software, important system functions such as perception, path planning, and location services sit in software/AI layer. While somewhat effective, AV stacks are quite a bit less effective then a human who can navigate the world spending only about a 100 watts of power. There are a number of places where humans/machine autonomy differ. These include:

  1. Focus: Humans have the idea of focus and peripheral vision…whereas AVs monitor all directions all the time. This has implications on power, data, and computation
  2. Movement based Perception: Humans use movement as a key signature for identification. In contrast, current perception engines effectively try to work on static photos.
  3. Perception based recognition: Humans use an expectation of the future movement of objects to limit computation. This technique has advantages in computation, but is not currently used in AVs.

Thus, in addition to traditional machine learning techniques, newer AI architectures with properties of robustness, power/compute efficiency, and effectiveness are open research problems.

In terms of Ecosystem, key open research problems exist in areas such as safety validation, V2X communication, and ecosystem partners.

Verification and validation (V\&V) for autonomous systems is evolving rapidly, with key research focused on making AI-driven behavior both “provably safe and explainable.” One major direction involves “bounding AI behavior” using formal methods and developing “explainable AI” (XAI) that supports safety arguments regulators and engineers can trust. Researchers is also focused on “rare and edge-case scenario generation” through adversarial learning, simulation, and digital twins, aiming to create test cases that challenge the limits of perception and planning systems. Defining new “coverage metrics”—such as semantic or risk-based coverage—has become crucial, as traditional code coverage doesn’t capture the complexity of non-deterministic AI components. Another active area is “scalable system-level V&V,” where component-level validation must support higher-level safety guarantees. This includes “compositional reasoning,” contracts-based design, and model-based safety case automation. The integration of digital twins for closed-loop simulation and real-time monitoring is enabling continuous validation even post-deployment. In parallel, “cybersecurity-aware V&V” is emerging, focusing on spoofing resilience and securing the validation pipeline itself. Finally, standardization of simulation formats (e.g., OpenSCENARIO, ASAM) and the rise of “test infrastructure-as-code” are laying the groundwork for scalable, certifiable autonomy, especially under evolving regulatory frameworks like UL 4600 and ISO 21448.

One of the ecosystem aids to autonomy maybe connection to the infrastructure and of course, in mixed human/machine environments there is the natural Human Machine Interface (HMI). Key research in V2X (Vehicle-to-Everything) for autonomy centers on enabling cooperative behavior and enhanced situational awareness through low-latency, secure communication. A major area of focus is on “reliable, high-speed communication” via technologies like “C-V2X and 5G/6G,” which are critical for supporting time-sensitive autonomous functions such as coordinated lane changes, intersection management, and emergency response. Closely linked is the development of “edge computing architectures,” where V2X messages are processed locally to reduce latency and support real-time decision-making. Research is active in “cooperative perception,” where vehicles and infrastructure share sensor data to extend the field of view beyond occlusions, enabling safer navigation in complex urban environments. Another core research direction is the integration of “smart infrastructure and digital twins,” where roadside sensors provide real-time updates to HD maps and augment vehicle perception. This is essential for detecting dynamic road conditions, construction zones, and temporary signage. In parallel, ensuring “security and privacy in V2X communication” is a growing concern. Work is underway on encrypted, authenticated protocols and on methods to detect and respond to malicious actors or faulty data. Finally, standardization and interoperability are vital for large-scale deployment; efforts are focused on harmonizing communication protocols across vendors and regions and on developing robust, scenario-based testing frameworks that incorporate both simulation and physical validation. Finally, an open research issue is the tradeoff between individual autonomy and dependence on an infrastructure. Associated with infrastructure dependence are open issues of legal liability, business model, or cost.

Human-Machine Interface (HMI) for autonomy remains an area with several open research and design challenges, particularly around trust, control, and situational awareness. One major issue is how to build “appropriate trust and transparency” between users and autonomous systems. Current interfaces often fail to clearly convey the vehicle’s capabilities, limitations, or decision-making rationale, which can lead to overreliance or confusion. There's a delicate balance between providing sufficient information to promote understanding and avoiding cognitive overload. Additionally, ensuring “safe and intuitive transitions of control,” especially in Level 3 and Level 4 autonomy, remains a critical concern. Drivers may take several seconds to re-engage during a takeover request, and the timing, modality, and clarity of such prompts are not yet standardized or optimized across systems. Another set of challenges lies in maintaining “situational awareness” and designing “adaptive, accessible interfaces.” Passive users in autonomous systems tend to disengage, losing track of the environment, which can be dangerous during unexpected events. Effective HMI must offer context-sensitive feedback using visual, auditory, or haptic cues while adapting to the user’s state, experience level, and accessibility needs. Moreover, autonomous vehicles currently lack effective ways to interact with external actors—such as pedestrians or other drivers—replacing human cues like eye contact or gestures. Developing standardized, interpretable external HMIs, a language of driving, remains an active area of research. Finally, a lack of unified metrics and regulatory standards for evaluating HMI effectiveness further complicates design validation, making it difficult to compare systems or ensure safety across manufacturers.

Finally, autonomy will have implications on topics such as civil infrastructure guidance, field maintenance, interaction with emergency services, interaction with disabled and young riders, insurance markets, and most importantly the legal profession. There are many research issues underlying all of these topics.

In terms of business models, use models and their implications for supply chain are open research problems. For example, for the supply chain, the critical technology is semiconductors which is highly sensitive to very high volume. For example, the largest market in mobility, the auto industry, is approx. 10% of semiconductor volume, and the other forms (airborne, marine, space) are orders-of-magnitude lower. From a supply chain point perspective, a small number of skews which service a large market are ideal. The research problem is: What should be the nature of these very scalable components. In terms of end-markets, autonomy in traditional transportation is likely to lead to a reduction in unit volume. Why? With autonomy, one can get much higher utilization (vs the < 5% in today's automobiles). However, it is also likely that autonomy unleashes a broad class of solutions in markets such as agriculture, warehouses, distribution, delivery, and more. Micromobility applications in particular offer some interesting options for very high volumes. The exact nature of the applications is an open research problem.

Summary

References


[2] INSIGHT. The Journey Towards Autonomy in Civil Aerospace. Technical report. Cranfield, United Kingdom: Aerospace Technology Institute (ATI); 2020
[3] Chen H, Wang XM, Li Y. A Survey of Autonomous Control for UAV. Washington, D.C., United States: IEEE Computer Society; 2009
[5] Chen TB. Management of Multiple Heterogenous Unmanned Aerial Vehicles Through Capacity Transparency [thesis]. Queensland, Australia: Queensland University of Technology; 2016
[6] EASA. Easy Access Rules for Unmanned Aircraft Systems. Technical report. Cologne, Germany: European Union Aviation Safety Agency; 2022
[7] D. Cvetković, Ed., ‘Drones - Various Applications’. IntechOpen, Dec. 08, 2023. doi: 10.5772/intechopen.1000551
[8] Sommerville, I. (2016). Software Engineering (10th ed.). Pearson
[9] Pressman, R. S., & Maxim, B. R. (2020). Software Engineering: A Practitioner’s Approach (9th ed.). McGraw-Hill
[10] Royce, W. W. (1970). Managing the development of large software systems. Proceedings of IEEE WESCON
[11] Agile Alliance. (2001). Manifesto for Agile Software Development. https://agilemanifesto.org
[12] Boehm, B. W. (1988). A spiral model of software development and enhancement. Computer, 21(5), 61–72.
[13] Pressman, R. S., & Maxim, B. R. (2020). Software Engineering: A Practitioner’s Approach (9th ed.). McGraw-Hill
[14] IEEE. (2012). ISO/IEC/IEEE 828: Configuration Management in Systems and Software Engineering. IEEE Standards Association.
[15] Sommerville, I. (2016). Software Engineering (10th ed.). Pearson
[16] Pressman, R. S., & Maxim, B. R. (2020). Software Engineering: A Practitioner’s Approach (9th ed.). McGraw-Hill.
[17] IEEE. (2012). ISO/IEC/IEEE 828: Configuration Management in Systems and Software Engineering. IEEE Standards Association
[18] NASA. (2021). Configuration Management Procedural Requirements (NPR 7120.5E). National Aeronautics and Space Administration
[19] Wang, L., Xu, X., & Nee, A. Y. C. (2022). Digital twin-enabled integration in manufacturing. CIRP Annals, 71(1), 105–128.
[20] IEEE. (2012). ISO/IEC/IEEE 828: Configuration Management in Systems and Software Engineering. IEEE Standards Association
[21] NASA. (2021). Configuration Management Procedural Requirements (NPR 7120.5E). National Aeronautics and Space Administration
[22] Wang, L., Xu, X., & Nee, A. Y. C. (2022). Digital twin-enabled integration in manufacturing. CIRP Annals, 71(1), 105–128.
[23] Raj, A., & Saxena, P. (2022). Software architectures for autonomous vehicle development: Trends and challenges. IEEE Access, 10, 54321–54345
[24] Raj, A., & Saxena, P. (2022). Software architectures for autonomous vehicle development: Trends and challenges. IEEE Access, 10, 54321–54345
[25] AUTOSAR Consortium. (2023). AUTOSAR Adaptive Platform Specification. AUTOSAR
[26] Lee, E. A., & Seshia, S. A. (2020). Introduction to Embedded Systems: A Cyber-Physical Systems Approach (3rd ed.). MIT Press.
[27] Baruah, S., Baker, T. P., & Burns, A. (2012). Real-time scheduling theory: A historical perspective. Real-Time Systems, 28(2–3), 101–155
[28] Broy, M., et al. (2021). Modeling Automotive Software and Hardware Architectures with AUTOSAR. Springer
[29] LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.
[30] Thrun, S. (2010). Toward robotic cars. Communications of the ACM, 53(4), 99–106
[31] Raj, A., & Saxena, P. (2022). Software architectures for autonomous vehicle development: Trends and challenges. IEEE Access, 10, 54321–54345
[32] Benjamin, M. R., Curcio, J. A., & Leonard, J. J. (2012). MOOS-IvP autonomy software for marine robots. Journal of Field Robotics, 29(6), 821–835
[33] Wang, L., Xu, X., & Nee, A. Y. C. (2022). Digital twin-enabled integration in manufacturing. CIRP Annals, 71(1), 105–128
[34] Baruah, S., Baker, T. P., & Burns, A. (2012). Real-time scheduling theory: A historical perspective. Real-Time Systems, 28(2–3), 101–155
[35] Wang, L., Xu, X., & Nee, A. Y. C. (2022). Digital twin-enabled integration in manufacturing. CIRP Annals, 71(1), 105–128.
[36] LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444
[37] Boyens, J., Paulsen, C., Bartol, N., Shankles, S., & Moorthy, R. (2020). NIST SP 800-161: Supply Chain Risk Management Practices for Federal Information Systems and Organizations. National Institute of Standards and Technology
[38] Wang, L., Xu, X., & Nee, A. Y. C. (2022). Digital twin-enabled integration in manufacturing. CIRP Annals, 71(1), 105–128
[39] Raj, A., & Saxena, P. (2022). Software architectures for autonomous vehicle development: Trends and challenges. IEEE Access, 10, 54321–54345.
[40] Russell, S. J., & Norvig, P. (2021). Artificial Intelligence: A Modern Approach (4th ed.). Pearso
[41] Pablo Alvarez Lopez, Michael Behrisch, Laura Bieker-Walz, Jakob Erdmann, Yun- Pang Flötteröd, Robert Hilbrich, Leonhard Lücken, Johannes Rummel, Peter Wag- ner, and Evamarie Wießner. Microscopic traffic simulation using sumo. In The 21st IEEE International Conference on Intelligent Transportation Systems. IEEE, 2018.
[42] Autoware Foundation. TIER IV AWSIM. https://github.com/tier4/AWSIM, 2022.
[43] Fremont, Daniel J., et al. “Formal scenario-based testing of autonomous vehicles: From simulation to the real world.” 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2020.
[44] Pikner, Heiko, et al. “Autonomous Driving Validation and Verification Using Digital Twins.” VEHITS (2024): 204-211.

Appendixes

Indutrial Use Case #1

Indutrial Use Case #2

Indutrial Use Case #3