Encryption and Privacy IV: Zero-knowledge proofs
Zero-knowledge proofs (ZKPs) are a set of techniques that allow the implementation of two measures established in Article 25 of the GDPR: minimisation and limitation of data accessibility. The particularity of ZKPs is that they allow assess the certainty of an information without that information itself being exposed. Therefore, they are a tool for implementing the minimisation principle in distributed contexts, such as Internet services in general, cloud computing, blockchain, etc.
Suppose a 19-year-old user wants to prove that he or she is over 16 years old to an Internet service, through the network, without revealing his or her real age or identity. The following procedure could be followed to this end:
- The user provides proof of age to a trusted third party. Said user generates a random number, called N, which has nothing to do with his or her age and which is kept confidential.
- From this number, another one called PROOF is generated, which is obtained by applying a hash function to N as many times as the difference between the current age and the legal age, adding 1. That is to say:
PROOF = Hash 1+age-16 (N)
In this case PROOF = Hash 4 (N) = Hash(Hash(Hash(Hash(N))))
- The PROOF value obtained is stored on a server of the trusted third party on an address accessible from the Internet with a URL constructed with random values, e.g. URL (XXXX).
- The user maintains the N value and the URL (XXXX). Meanwhile, the PROOF value accessible from the Internet at the URL (XXXX) is stored on the server of the trusted third party, unlinked to the user's personal identity.
- Afterwards, this user wants to prove to an Internet service that he or she is really over 16 years old. For this purpose, he or she sends two pieces of information to the Internet service:
The URL (XXXX),
His or her age coded in an illegible way as follows: AGEhash = Hash 1+age(N).
In other words, he or she does not calculate the hash of his or her age, but performs the hash of the N value set at the beginning, but as many times as the user plus one is old. In our example, he or she would do the hash of the N value kept confidential 20 times:
AGEhash = Hash 20 (N).
- The Internet service, using the URL (XXXX) provided by the user, obtains the PROOF value stored. The hash function is then applied to this value as many times as the minimum age to be proved, in this case 16 years:
VERIFY = Hash 16 (PROOF) = Hash 16 (Hash 4 (N)) = Hash 20 (N) = AGEhash
Therefore, the Internet service (in this case called "verifier"), by checking that the VERIFY value matches the AGEhash value, can establish that the user (in this case called "tester") is over 16 years old. This has been achieved without the user revealing his or her specific age to the Internet service, nor his or her identity data, and without this information travelling over the network. This is an example of a zero-knowledge proof, or ZKP, the operation of which can be checked on this website.
The above protocol can be considered to be a ZKP because three requirements are met:
- Completeness (or integrity): if the statement is correct, the verifier (Internet service in this case) has reasonable assurance that what the tester (the user) says is true.
- Soundness (or robustness): if the statement is false, it is very unlikely that the tester can deceive the verifier.
- Zero knowledge: if the statement is correct, the verifier cannot infer any additional information about the tester, in this case the user.
ZKPs are based on cryptographic techniques, which in turn are based on the use of algorithms and functions that are very difficult to reverse, such as encryption, hash functions or the use of modular arithmetic. Therefore, these are proofs that make it possible to give a probabilistic, not absolute, certainty as to whether the information is correct or not. When applying a ZKP to the design of a processing operation, it is necessary to assess whether this uncertainty reaches sufficiently low values for the risk to be assumed in the framework of that specific processing.
The weaknesses of the ZKP systems, as shown in the example, derive from the difference that will always exist between the theoretical concept and the practical application. In other words, the risk to rights and freedoms arises from the distance existing between the algorithms and procedures and their actual implementation in a specific technical, organisational and legal context.
The specific implementation of a system such as the one shown could have, among others, the following weaknesses:
- The link made in the trusted third party between the URL (XXXX) and the identification of the subject.
- The profiling that could be carried out by the third party based on the URL (XXXX) and the set of Internet services that claim access to it.
- The choice of vulnerable values in the construction of the URL.
- The weakness in the robustness or randomness of the initial values, in this case the initial seed N (in other types of ZKP in search of prime numbers).
- The metadata associated to the information exchanges between the subject, the trusted third party and especially the Internet service.
- The exchange of metadata or device fingerprint between different Internet services.
- The analysis of traffic information in the access to the user, third party and Internet services.
- The compromise of the user's terminal or security breaches in the servers of the third party.
- The lack of application of the data minimisation principle in the servers of the third party or the accessed Internet services.
In short, these weaknesses arise from the lack of risk management for rights and freedoms beyond the concept that also takes into account the design and implementation of ZKPs. These weaknesses may in turn be aggravated by the lack of review and monitoring actions (Art. 24.1 of the GDPR), such as audits, which are essential in large-scale deployment and production and in specific contexts.
There are many types of ZKP and there is no systematic approach to the development of this type of solution. There are even ZKP methods that are not linked to digital processing and that are used to intuitively understand this concept.
ZKPs can be categorised in those that are interactive, or that require interaction with the subject/tester for each verification. There are also non-interactive ZKPs (NIZK, ZK-SNARK or ZKZK-STARK), in which the service/verifier can check the truthfulness without the need to interact with the tester. The latter are more interesting because they are more scalable. They can also be classified into those specifically oriented for comparisons, such as checking values in certain ranges (Zero Knowledge Range Proofs or ZKRP) or inclusion in sets (Zero Knowledge Set Membership or ZKSM).
ZKPs are a tool that allows the implementation of two measures established in Article 25 of the GDPR: minimisation and limitation of data accessibility. These measures must be implemented both by default and by design. The uses of this technique are very diverse: checking age limits, checking conditions such as nationality, electronic voting, purchases, auctions, confidentiality in transactions, analysis of fund provisions; demonstrating financial solvency, privacy in blockchain, etc. As in the example shown, ZKPs do not delete personal information –in fact, there is a unique identifier, the URL (XXXX) address, linked to a subject–, but they can be powerful pseudonymisation tools.
Finally, it is advisable to follow the recommendations drawn up by the AEPD to apply the minimisation principles that can be found on the Innovation and Technology page, in particular: