Zen and the art of safety assurance for machine learning in autonomous driving
By Dr Simon Burton
Perception: the safety bottleneck for automated driving
Autonomous vehicles operate in an environment that is inherently complex and continuously evolving.
To argue the safety of such systems we need to demonstrate that at all times the system has an adequate level of understanding of its current context, and makes the right decisions to ensure that the vehicle safely reaches its destination.
For autonomous vehicles, perception functions, i.e. understanding the surroundings based on noisy and incomplete sensor data, appear to be somewhat of a bottleneck in the safety argument.
Consider the difficulty of detecting pedestrians under all possible traffic and weather conditions. How does the system know how to react safely if it has not even “seen” the obstacles in its path?
Machine learning to the rescue?
There’s a lot of hype about machine learning for autonomous driving.
Algorithms such as deep neural networks appear to be able to make sense of unstructured data using efficient computations in real-time. By providing enough labelled images, the algorithms learn to classify objects such as vehicles and pedestrians with accuracy rates that can surpass human abilities.
However, there’s a catch.
These algorithms do not deliver clear-cut answers. For a given video frame they might classify the probability of a pedestrian inhabiting a certain portion of the picture as 83%. But in the very next frame, imperceptibly different to the last, they may “misclassify” the same object as only 26% pedestrian and 67% road sign.
The individual decisions made by the algorithms, based on the millions of different weights in the neural network, are difficult to decrypt. This leads to an unclear understanding of their performance.
So how do we go about arguing that our systems are safe when applying machine learning for perception tasks?
You are strong when you know your weaknesses
Standards have not yet been defined for developing safe machine learning functions. Traditional approaches to demonstrating software safety do not transfer well.
For machine learning, we need to throw away a lot of our current approaches and argue safety from first principles.
One strategy I am currently following is to identify the possible causes of performance problems in the algorithms, then define measures to reduce the risk that such insufficiencies lead to hazardous events.
Such causes include:
- an incomplete understanding of the environment
- subtle differences between the training and execution environment
- the lack of available training data for critical situations
- robustness issues due to the inherent complexity of the input space
The emerging research that is addressing and quantifying these causes of insufficiencies can be seen as the slope of enlightenment. It brings us out of the inevitable trough of disillusionment that results from the realisation of the limitations of machine learning for safety-critical functions.
You are beautiful when you appreciate your flaws
One path to system-level safety, despite the inherent weaknesses in machine-learning based perception, is to apply heterogeneous redundancy on the sensing path, e.g. by deploying radar and camera sensors in parallel, using both machine learning and standard algorithms.
For this approach to work, a deep understanding of the limitations of each sensing path and (machine learning) algorithms is required. An argument can then be formed that weaknesses in one path are either compensated by the other, or by additional plausibility checks, e.g. a pedestrian that appears in the centre of an image should not disappear from view in an image taken just milliseconds later.
You are wise when you learn from your mistakes
New testing techniques and analytical approaches are being developed to uncover performance problems of machine learning functions. These will include simulation and synthetic data to mimic situations that are too critical or rare to cover with traditional vehicle testing.
However, a perfect assurance argument will never be possible during design time. Therefore, we need continuous monitoring of performance in the field. This will both detect previously undiscovered weaknesses in the trained function, as well as inconsistencies in the assumptions made about the environment itself.
Ideally, this monitoring will increase the safety requirements allocated to the function in line with the quality of the collected evidence. This level of retrospection will also build confidence that the assurance case really is an accurate indicator of actual performance in the field.
With this approach, machine learning may yet live up to the superhero status the technology hype cycle is giving it.
Dr Simon Burton
Chief Expert — Safety, Reliability and Availability
Robert Bosch GmbH
Simon is also a Programme Fellow on the Assuring Autonomy International Programme. Contribute to the strategic development of the Programme as a Fellow.