Are we nearly there yet?
--
When to stop testing automated driving functions
by Professor Simon Burton
A question I am often asked in my day job is “How much testing do we need to do before releasing this level 4 automated driving function?” — or variations thereof. I inevitably disappoint my interrogators by failing to provide a simple answer. Depending on my mood, I might play the wise-guy and quote an old computer science hero of mine:
(Program) testing can be used to show the presence of bugs, but never to show their absence! — Edsger W. Dijkstra, 1970.
Sometimes I provide the more honest, but equally unhelpful response: “We simply don’t know for sure”.
The problem is that attempting to demonstrate the safety of autonomous vehicles using road tests alone would involve millions or even billions of driven kilometres. For example, to argue the mean distance between collisions of 3.85 million km (based on German crash statistics) with a confidence value of 95%, a total of 11.6 million test kilometres must be driven without collisions.
Of course, we would also need to integrate these into an iterative build, test, fix, repeat process: every time we change the function or its environment, we would need to start all over again.
Challenges in testing highly automated driving
However, it is not just the statistical nature of the problem that is challenging. Automated driving functions pose specific challenges to the design of test approaches:
- Controllability and coverage: How can we control all relevant aspects of the operational design domain (ODD) and system state in order to test specific attributes of the function and triggering conditions of the environment? And how do we argue that we have achieved coverage of the ODD?
- Repeatability and observability: A range of uncertainties in both the system and its environments lead to major challenges when demonstrating the robustness of the function within the ODD and reproducing failures observed in the field for analysis in the lab. Furthermore, it may not be possible to determine in which state the system was when the failure occurred, either due to system complexity or the opaque nature of the technology.