Building credit bureaus that have no personal information
In two blog posts (here and here), I have argued that in an era of widespread hacking, the credit bureau’s business model is unsustainable because it requires storing enormous amounts of confidential information on tens of millions of individuals who are not even its customers.
However, these bureaus serve a useful function of aggregating information about an individual from multiple sources and condensing all this information into a credit score that measures the credit worthiness of the individual, An individual has credit relationships with many banks and other agencies. He might have a credit card from one bank, a car loan from another bank and a home loan from a third; he may have overdue payments on one or more of these loans. He might also have an unpaid utility bill. When he applies for a new loan from a yet another bank, the new bank would like to have all this information before deciding on granting the loan, but it is obviously impractical to write to every bank in the country to seek this information. It is far easier for all banks to provide information about all their customers to a central credit bureau which consolidates all this information into a composite credit score which can be accessed by any bank while granting a new loan.
The problem is that though this model is very efficient, it creates a single point of failure – a single entity that knows too much information about too many individuals. What is worse, these individuals are not customers of the bureau and cannot stop doing business with it if they do not like the privacy and security practices of the bureau.
We need to find ways to let the bureaus perform their credit scoring function without receiving storing confidential information at all. The tool required to do this (homomorphic encryption) has been available for over a decade now, but has been under utilized in finance as I discussed in a blog post two years ago.
Suppose there is only one bank
To explain how a secure credit bureau can be built, I begin with a simple example where the bureau obtains information only from one bank (or other agency) which has the individual as a customer. I will then extend this to multiple banks.
-
The credit score of an individual can be approximated by a linear function (weighted sum) of a bunch of attributes relating to the individual:
score = w1 x1 + w2 x2 + ... + wn xn
where wi is a weight (coefficient) and xi is an attribute (for example, xi could indicate whether the individual is delinquent on a car loan and x2 could represent the credit card debt outstanding as a percentage of the credit limit). Since xi could be a non linear function (for example, the square or logarithm) of the underlying variable, the linear form is not really restrictive.
The attributes xi are known only to the bank. These are never revealed to the bureau which sees only the weighted sum above.
The weights wi are proprietary information that needs to be known only to the credit bureau. The bureau encrypts the weights and sends the encrypted weights to the bank.
-
Homomorphic encryption allows the bank to compute the weighted sum
score = w1 x1 + w2 x2 + ... + wn xn
without decrypting the weights. Actually, the bank does not see the weighted sum (the score). What it computes using homomorphic encryption is the encrypted weighted sum, but the credit bureau can decrpyt this and obtain the score. Since the xi are known to the bank, the computation of this scalar product requires only Additive or Partial Homomorphic Encryption (AHE or PHE) which is much more efficient than Full Homomorphic Encryption (FHE). The GLLM method (Goethals et al. "On private scalar product computation for privacy-preserving data mining." ICISC. Vol. 3506. 2004.) based on the Paillier AHE can do the job.
-
At the end therefore:
The credit bureau knows the credit score of the individual.
The credit bureau has not revealed either its scoring rule or the credit score of the individual.
The bank has not revealed any confidential information about the customer to the credit bureau other than the credit score. (Note for the geeks: The privacy guarantee here is at the highest possible level – it is information theoretical (Theorem 1 of Goethals et al.) and not merely cryptographic. Even in the implausible worst case scenario where the cryptography is somehow broken, that would leak information from the credit bureau to the banks but not in the other direction.)
The above procedure is repeated for each individual. The wi would be the same for all individuals, but xi would of course vary from individual to individual. To be precise, we should write the i’th attribute of the k’th individual as xki.
If the credit bureau is hacked, confidential information belonging to the individuals is not exposed because the bureau does not have this at all. The credit scores and the scoring rule may be exposed, but this is a loss primarily to the credit bureau and there are no negative externalities involved.
Extension to Multiple Banks
In general, the credit bureau will need information from many (say m) banks (or other agencies).
-
The credit score of an individual can be represented as a weighted sum of sub scores from various banks (the bureau may or may not use equal weights ui = 1 or ui = 1/m for this purpose):
Total Score = u1 subscore1 + u2 subscore2 + ... + um subscorem
where the uj is the weight of bank j and subscorej is the sub score computed using information only from bank j as follows:
subscorej = w1 xj1 + w2 xj2 + ... + wn xjn
where xji is the i’th attribute of the individual at bank j.
-
Bank j can use homomorphic encryption to compute uj subscorej. We first define a set of modified weights vji for attribute i for bank j as:
vji = uj wi
and then let the bank compute a weighted sum exactly as in the one bank case but using weights vji instead of wi:
uj subscorej = vj1 xj1 + vj2 xj2 + ... + vjn xjn
The credit bureau adds up all the uj subscorej that it receives from various banks to find the credit score of the individual.
-
We can however get one further level of privacy in this case where the credit bureau is able to compute the total score of an individual without learning any of the subscorej. If this extra privacy is desired, we modify the procedure as follows:
-
Bank j computes
disguised_subscorej = uj subscorej + rj
where rj is a random number chosen by bank j. The bank communicates the disguised_subscore to the credit bureau. (Note for the geeks: Actually since the bank computes and communicates an encrypted form of this quantity homomorphically, it needs to encrypt rj also. This is possible since we are using public key cryptography – the public key of the credit bureau is publicly available and anybody can encrypt using this key; but only the bureau can perform decrpytion because only it has the private key).
All the banks collectively compute the sum of all the rj using secure multi party computation based on secret sharing methods which ensure that no bank learns the rj of any other bank. The sum of all the rj (let us call it sum_r) is communicated to the credit bureau.
The credit bureau computes the sum of all the disguised_subscorej. From this result, it subtracts sum_r to get the correct total credit score.
-
-
At the end therefore:
The credit bureau knows the total credit score of the individual.
The credit bureau has not revealed either its scoring rule or the credit score of the individual.
The bank has not revealed any confidential information about the customer to the credit bureau: not even the sub score based on data in its possession.
The above procedure is repeated for each individual. The modified weights vji would be the same for all individuals at the same bank, but xji would of course vary from individual to individual. To be precise, we should write the i’th attribute of the k’th individual at the j’th bank as xjki. The rj (and therefore sum_r) should also ideally vary from individual to individual: strictly speaking, these are actually rkj and sum_rk for individual k. Similarly, disguised_subscorej should strictly speaking be disguised_subscorekj
Allowing the individual to verify all computations
How does an individual detect any errors in the credit score? How does an external auditor verify the computations for a sample of individuals?
The individual k would be entitled to receive a credit report from the credit bureau that includes (a) the unencrypted total credit score (total_scorek), (b) the encrypted disguised_subscorekj for all j, (c) the encrypted modified weights vji for all i and j and (d) sum_rk. Actually, (b), (c) and (d) should be publicly revealed by the credit bureau on its website because they do not leak any information.
The individual k would also be entitled to get two pieces of information from bank j: (a) the attributes xjki for all i and (b) the random number rkj.
With this information, the individual k can verify the computation of the encrypted disguised_subscorekj for all j (using the same homomorphic encryption method used by the banks). The individual can also verify sum_rk by adding up the rkj. Using the public key of the credit bureau, the individual can also encrypt total_scorek - sum_rk and compare this with the encrypted sum obtained by adding up all the disguised_subscorekj homomorphically.
The same procedure would allow an auditor to verify the computation for any sample of individuals.
The careful reader might wonder how the individual can detect an attempt by a bank to falsify rkj. In that case, sum_rk will not match the sum obtained by adding up the rkj, but how can the individual determine which bank is at fault? To alleviate this problem, each bank j would be required to construct a Merkle tree of the rkj (for all k) and publicly reveal the root hash of this Merkle tree. Individual k would then also be entitled to receive a path of hashes in the Merkle tree leading up to rkj. It is then impossible to falsify any of the rkj without falsifying the entire Merkle tree. Any reasonable audit procedure would detect a falsification of the entire Merkle tree. Depending on the setup, the auditor might also be able to audit (a sample of) the secure multi party computation of rkj directly by verifying a (sub) sample of the secret shares.
Conclusion
At the end, we would have built a secure credit bureau. A Equifax scale hacking of such a bureau would be of no concern to the public; it would be a loss only for the bureau itself. Mathematics gives us the tools required to do this. The question is whether we have the good sense and the will to use these tools. The principal obstacle might be that the credit bureau would have to earn its entire income by selling credit scores; it would not be able to sell personal information about the individual because it does not have that information. But this is a feature and not a bug.
Posted at 4:20 pm IST on Fri, 20 Oct 2017 permanent link
Categories: bond markets, fraud, risk management, technology
Comments