At the Cebit 2018 ferry, I helped our HR department at Volkswagen Commercial Vehicles hire future IT top talent. One of them asked me about Volkswagen’s blockchain cooperations (such as IOTA) and how the automotive industry wants to leverage the potential. So I thought it was time for another blog post in which I show the potential of blockchain for autonomous driving to protect sensor data to guarantee data integrity and secure the execution of in-car AI-based algorithms.
Introduction to Blockchain Technology
Blockchain became popular with the rise of bitcoin. But how exactly can the Automotive industry leverage the potential of blockchain? A blockchain contains an immutable fact store. So by utilizing this technology, you can easily build an irrevocable proof of occurred transactions for satisfying regulatory oversight. This is perfect, for example, when storing sensor data from cars for building high-quality real-time maps for autonomous driving in the future – as Volkswagen may do with IOTA.
To be honest, the basics to build such an irrevocable proof are already in place in every big enterprise. They have not, however, linked together with the required technical components to yield the advantages of a blockchain. Three things are important to deliver these advantages:
- A shared distributed ledger
- Smart contracts
A shared distributed ledger is really as simple as it sounds. The most important aspect of the ledger is that you cannot revert and change a single item without having to rewrite the entire ledger. It’s like announcing a marriage on television and then later denying that you did it. There may always be somebody who can confirm that you did it. It’s a distributed ledger. The simplest implementation could be a decentralized source control system such as Git. This proves especially useful when dealing with regulatory bodies since the extent of required work to falsify information is immense. Indeed, with substantial checks in place, it becomes nearly impossible. Since it guarantees high redundancy and availability due to its distributed nature, it is well suited for all kinds of data, which you want to protect from manipulation. Imagine, for example, a two-party agreement when buying a house, and the notary service at the closing. Either party could try to falsify the contract and claim their copy is the right one. With the notary having a third party copy which cannot be corrupted, we have an irrevocable proof in place. And this is probably the most important feature of a blockchain.
Such a distributed ledger could be used, for example, to solve the problem of odometer manipulation. The manipulation of odometers in cars is considered fraud under German law and a major annoyance to the authorities trying to prevent it. The resulting economic damage is enormous. According to evaluations by TÜV Rheinland, it is estimated that one third of all used cars sold in Germany have odometers which have been manipulated. For the buyer, this translates into an average loss of 3,000 Euros. The annual damages amount to approximately 6 billion Euros, nationwide. One approach to prevent this kind of fraud would be to write the mileage of all passenger vehicles into a database that is protected from manipulation – a distributed ledger of a blockchain.
Now let’s look at the concept of smart contracts. The first blockchain implementation, which was Bitcoin, did not have the objective to create a smart contract platform. But what the heck is a smart contract? A smart contract is a mechanism for making sure that a software program can be executed and audited, and to prove what it did. Suppose Volkswagen offers a car-sharing service and you rent a car from them using crypto-currency. The system would be set up so you receive a receipt, which is stored in a virtual contract. Then Volkswagen gives you a digital entry key, delivered to you by a specified date. If the key is not delivered to you on time, the blockchain automatically releases a refund. If Volkswagen sends you the key before the rental date, a function would automatically block the release of both the fee and the key to you until the rental time arrives. The system acts accordingly to the if-then-paradigm. By being stored in a distributed open ledger, it is witnessed by hundreds or thousands of people. If Volkswagen sends you the key, Volkswagen can be sure to be paid. If you pay, for example, by sending the money in Bitcoin, you will receive the key. The contract is automatically canceled at the expiration time, and the code cannot be interfered by either parties without the other knowing as all of the participants would be automatically informed. There are many more use cases in addition to smart car rental in the Automotive industry.
Consensus is the third important ingredient for a blockchain. The term consensus refers to who, specifically, can write the blockchain. Regarding Bitcoin, for example, this can only be done in a distributed manner. So no single person is able to own the entire blockchain. The simplest consensus algorithm may look very similar to a group voting process where a quorum is needed to proceed.
How to Protect Sensor Data (or Blockchain for Solving the Data Integrity Challenge in Autonomous Driving)?
By 2020, industry experts expect more than 250 million connected vehicles on the road. The need to perform frequent remote software updates of car components increases with the level of automation – especially with autonomous cars.
Now let’s look at how we can leverage blockchain concepts to protect sensor data.
Data is the foundation from which Information, Knowledge and Wisdom are deduced for any sensor-based application as in autonomous driving. The data, (images, sound, radar or lidar data) hits the sensors and the algorithms process it. All algorithms extract information from the data and knowledge is established to instruct the actuators into action like changing direction, accelerating, decelerating and so on. In brief: data is tremendously important for autonomous driving. What happens, however, if the data is tampered with in some way? Since data is the basis of our pyramid, if it is compromised, the pyramid collapses. Simply put, since all information is derived from the intake of data, if the date is incorrect, the knowledge will be incorrect. In the case of autonomous driving, such a corruption can set lives at risk.
To clarify the requirements of our use case, let’s look at another famous use case in the internet-of-things domain, which demonstrates the importance of data: Index-based Rainfall Insurance. The life of African famers in Kenya heavily rely on the amount of rain over a season. In a good season, there will be enough rain to grow enough crops to eat and sell. Consequently, a low-rain season can destroy a year of hard work and possibly a life savings. This is why insurance providers and farmers have started working together to create an innovative insurance model. Famers get paid if there is not sufficient rain in a season. This model, however, heavily relies on data stemming from local IoT weather stations. The data and model decide if a payout will be triggered or not. By using blockchain technology, efficiency is improved as the costly human element, who would check the farmers’ claims, are no longer needed. Nevertheless, since data is now the only driver for the decision whether or not a payout is triggered, it has to be protected against manipulation. Therefore, this use case clearly explains how data integrity is vitally important.
In terms of autonomous driving, let’s assume we are a car manufacturer and our objective is to create real-time map updates based on the sensor data of our autonomous car fleet. Based on the velocity of the autonomous cars, traffic jams are predicted and fed back to the in-car maps of our fleet. By manipulating this data, autonomous cars could be instructed to do panic breaking where they shouldn’t or could drive into the rear-end of stopped traffic.
But what does data integrity mean? Briefly, data integrity refers to the point that you trust the data, which you want to rely on, is correct and honest. Nevertheless, there are a number of use cases where it’s very seductive to tamper with data. And to be honest, it happens all the time. You can manipulate data in two ways: by hacking the hardware or by hacking the software. Regarding the hardware this means the sensor. This can be achieved by preventing the sensor from sampling real data of its environment or by feeding in false input. In the index-based rainfall insurance example, this can be achieved by protecting the local IoT weather stations with an umbrella during a rainfall. One can prevent this scenario by either keeping the location of the sensor secret or by keeping it monitored all the time. In order to inhibit the manipulation of single sensors, you can use consensus algorithms such as Byzantine fault tolerant redundancy. In this instance, one would have to falsify a number of sensors in order to manipulate the data collection. Additionally, one would have to know the location of all sensors. As one of the largest car makers in the world, having so many cars on the road is a big advantage for Volkswagen. There are so many cars on the road in such a high density that the sensor data of all the cars cannot be manipulated. So if you wanted to detect traffic jams on a motorway, such an algorithm would rely on the data of a large number of cars, which guarantees data integrity for large car manufacturers.
Another easier method to manipulate data is to hack it after it has been stored in a central database. The sensor can work correctly, perfectly sampling and logging the amount of rainfall, or in case of autonomous driving its environment, but this is senseless if the data is manipulated afterwards in the database. In case of our index-based rainfall example, it would be pretty easy for the insurance company, with unlimited access to the database, to change the data afterwards according to their needs. It wouldn’t be much more than a click of a button. For the other party, the insured farmer, it would be more difficult to change the data. The insured farmer would have to hack the firewall and database or would need a disloyal employee on the inside to do this job. Insurance is one of many industries where this applies. This holds also true for autonomous driving. If a car manufacturer produces autonomous cars and stores their driving decisions in a central internal database, there would always be the question of the data being altered according to the needs of the car manufacturer, in case of an accident, for example. Or, as in our traffic jam example, hackers could hack a centralized database and change the data before the real-time map updates are generated and sent back to the car fleet. These are reasons why data integrity can only be guaranteed by using a distributed ledger.
How to Secure the Execution of In-car AI-based Algorithms (or Blockchain as a Solution for Bringing Trust to AI)
Autonomous driving is full of machine learning and AI-based algorithms. A week does not go by without a big announcement on progress in the field of artificial intelligence (have a look at my previous blog post). We live in times of exponential, quarterly AI growth. Nevertheless, this rapid pace of growth has raised concerns that AI will take jobs away from humand or that AI may become too powerful and/or evil, like Skynet in the film Terminator. Frankly speaking, I am more of an optimist about where AI will lead us. As empathic person though, I can understand these concerns. In the recent past, we’ve seen a number of examples where AI failed or had developed in a way that we didn’t predict beforehand.
In order to secure the execution of in-car AI-based algorithms, we first have to solve three big but basic challenges:
- How can we identify an AI-algorithm (or if you are an AI, who are you)?
- How can we detect (compliant) failure?
- How can we detect (non-compliant) misbehavior?
If you are an AI, who are you? There are always people who try to leverage technology for scams. We’ve seen hackers spoofing websites, emails and recently bots. But how can we define who owns an AI, how it has been trained and what is it authorized to do? This is exactly what describes the identity of an AI algorithm. Additionally, why do AIs fail? AI algorithms differ a lot from traditional software programs since they are based on probabilistic models instead of rules. Software has been usually implemented as a collection of rules, which say „IF A happens, THEN do B“. With AI (and deep learning in particular), however, this isn’t true any more. Now AI models answer you with a likelihood such as „IF A happened then it’s 92% likely B is the next step”. Since they are based on probabilities and not on strict rules, they are more flexible. But since there is no such thing as a free lunch, we’ve had to accept that AI models can occasionally go wrong. Contrary to popular belief, we usually, really appreciate this kind of compliant failure to avoid overfitting. In addition to this kind of failure, which can occur in AI algorithms, there is also the danger that AI algorithms actually learn the wrong things, developing what’s called incompliant misbehavior. Reinforcement learning, for example, has failed several times in the past year on public stage (have a look at Microsoft’s Tay Bot). Imagine if these were AI processes you launched at work that went evil in similar ways. So, should we be afraid of evil AIs? How can we prevent AIs from becoming evil when some of the most genius people in our world such as Elon Musk and Stephen Hawking warn us that AI is out to get us? Swedish philosopher Nick Bostrom’s book, SuperIntelligence, states “If machine brains one day come to surpass human brains in general intelligence, then this new superintelligence could become very powerful…As the fate of the gorillas now depends more on us humans than on the gorillas themselves, so the fate of our species then would come to depend on the actions of the machine superintelligence.“ So, how can we identify AIs and control them? This is where blockchain steps in.
There is a new protocol that is based on blockchain called botchain. Botchain tries to solve the aforementioned problem. It is a network supported by several AI companies and comes with a decentralized identity for AIs, which doesn’t depend on the underlying operating platform. Each and every AI algorithm is able to register there and can be identified worldwide. It’s pretty similar to the way website certificates work today in order to validate website ownership. This new protocol also enables every AI to write regular, hash functions of their activity to a blockchain. Thereby, their actions are stored in an immutable way and can be inspected by those having the encryption keys. Therefore, what the AI did and when it did it can be proven, makeing sure malicious behavior can be discovered and corrected. Finally, the consensus mechanisms of blockchain can assure that a possibly evil AI stay under observation and control. By having a public record of the tasks an AI is allowed to do, which then have to be verified by multiple blockchain nodes, we can make sure an AI doesn’t overstep its limits.
So in spite of having concerns that AIs may cause damage in the future, blockchain technologies can help to solve these big challenges. Technologies such as botchain could be used to assure that only the correct AI algorithms are uploaded to autonomous cars and that their behavior is logged, observed and controlled.
How to Scale Blockchain concepts?
As a technology solution stack, the blockchain offers a number of benefits. Public blockchains, however, bring with them very slow transaction rates and transactions cost money. IOTA promises to solve these challenges.
IOTA ships with its own distributed ledger – called Tangle. Tangle is the permission-less distributed ledger of IOTA and guarantees data integrity by storing data in a distributed graph among all participating nodes. So it’s is very difficult for someone to manipulate the original data without the rest of the network seeing that it is now incompatible with their copy, therefore ensuring data integrity. To quote the IOTA website: “Furthermore unlike old blockchains where it costs money to send/store data, it is entirely free in the IOTA network. Similarly in the old blockchain architecture data transfers bloats and slows down the network, in IOTA’s Tangle it actually strengthens the security of the network and makes it even more efficient. Finally due to the unique architecture of the Tangle ledger it allows for partitioning of the network, meaning that you can branch off clusters from the main-Tangle and establish local networks that still ensure data integrity without having to worry about continuous internet connection. Of course there is no reason to store the whole dataset in the Tangle ledger, all you need is to store the hashes. Hashes are the equivalent of biometrics for data, if you alter the content of the data — its DNA — you will get a different hash, revealing that it has been tampered with.”
To be honest, I don’t fully comprehend the full extent of IOTA yet, even though I read the white paper. There they mention an implementation of their algorithm, which is not publicly available. This, indeed, limits the value of information in the IOTA white paper. Additionally, they currently use a central coordinator instead of the decentralized one. If they want to use a decentralized coordination in the future, they will have to cope with inconsistencies between versions of their Tangle graph when updates come in. They will not be able to avoid these inconsistencies which will create back doors for hackers. I think, from a conceptual perspective, IOTA valiantly tries to overcome several challenges of traditional blockchain implementations. At the moment though, it’s not mature enough to say if it will hold up to its promises.
I tried to give an introduction to blockchain and how these concepts can be used for autonomous driving. For autonomous driving it’s very important to have data integrity. Data sampled from the car sensors must not be manipulated since this can set lives at risk as seen in our example of predicting traffic jams for real-time map updates. Additionally, since autonomous driving heavily depends on the execution of AI algorithms leveraging machine learning and artificial intelligence, we had a look how protocols based on blockchain such as the botchain protocol can be used to identify, observe and control such algorithms.
In general, blockchain could be the enabler in the Automotive industry to solve the data integrity – and so the data quality – challenge for creating cross-brand eco-systems. These eco-systems will be created to increase safety on our roads. If the Mercedes car fleet, for example, detected a traffic jam on a motorway, this information should be propagated to the car fleets of other car manufacturers. By storing the underlying car data in a blockchain, the other car manufacturers would better understand the traffic jam prediction and could faster propagate this information to their own car fleet.