Building credit bureaus that have no personal information

In two blog posts (here and here), I have argued that in an era of widespread hacking, the credit bureau’s business model is unsustainable because it requires storing enormous amounts of confidential information on tens of millions of individuals who are not even its customers.

However, these bureaus serve a useful function of aggregating information about an individual from multiple sources and condensing all this information into a credit score that measures the credit worthiness of the individual, An individual has credit relationships with many banks and other agencies. He might have a credit card from one bank, a car loan from another bank and a home loan from a third; he may have overdue payments on one or more of these loans. He might also have an unpaid utility bill. When he applies for a new loan from a yet another bank, the new bank would like to have all this information before deciding on granting the loan, but it is obviously impractical to write to every bank in the country to seek this information. It is far easier for all banks to provide information about all their customers to a central credit bureau which consolidates all this information into a composite credit score which can be accessed by any bank while granting a new loan.

The problem is that though this model is very efficient, it creates a single point of failure – a single entity that knows too much information about too many individuals. What is worse, these individuals are not customers of the bureau and cannot stop doing business with it if they do not like the privacy and security practices of the bureau.

We need to find ways to let the bureaus perform their credit scoring function without receiving storing confidential information at all. The tool required to do this (homomorphic encryption) has been available for over a decade now, but has been under utilized in finance as I discussed in a blog post two years ago.

Suppose there is only one bank

To explain how a secure credit bureau can be built, I begin with a simple example where the bureau obtains information only from one bank (or other agency) which has the individual as a customer. I will then extend this to multiple banks.

The credit score of an individual can be approximated by a linear function (weighted sum) of a bunch of attributes relating to the individual:

score = w₁ x₁ + w₂ x₂ + ... + w_n x_n

where w_i is a weight (coefficient) and x_i is an attribute (for example, x_i could indicate whether the individual is delinquent on a car loan and x₂ could represent the credit card debt outstanding as a percentage of the credit limit). Since x_i could be a non linear function (for example, the square or logarithm) of the underlying variable, the linear form is not really restrictive.
The attributes x_i are known only to the bank. These are never revealed to the bureau which sees only the weighted sum above.
The weights w_i are proprietary information that needs to be known only to the credit bureau. The bureau encrypts the weights and sends the encrypted weights to the bank.
Homomorphic encryption allows the bank to compute the weighted sum

score = w₁ x₁ + w₂ x₂ + ... + w_n x_n

without decrypting the weights. Actually, the bank does not see the weighted sum (the score). What it computes using homomorphic encryption is the encrypted weighted sum, but the credit bureau can decrpyt this and obtain the score. Since the x_i are known to the bank, the computation of this scalar product requires only Additive or Partial Homomorphic Encryption (AHE or PHE) which is much more efficient than Full Homomorphic Encryption (FHE). The GLLM method (Goethals et al. "On private scalar product computation for privacy-preserving data mining." ICISC. Vol. 3506. 2004.) based on the Paillier AHE can do the job.
At the end therefore:
1. The credit bureau knows the credit score of the individual.
2. The credit bureau has not revealed either its scoring rule or the credit score of the individual.
3. The bank has not revealed any confidential information about the customer to the credit bureau other than the credit score. (Note for the geeks: The privacy guarantee here is at the highest possible level – it is information theoretical (Theorem 1 of Goethals et al.) and not merely cryptographic. Even in the implausible worst case scenario where the cryptography is somehow broken, that would leak information from the credit bureau to the banks but not in the other direction.)
The above procedure is repeated for each individual. The w_i would be the same for all individuals, but x_i would of course vary from individual to individual. To be precise, we should write the i’th attribute of the k’th individual as x^k_i.
If the credit bureau is hacked, confidential information belonging to the individuals is not exposed because the bureau does not have this at all. The credit scores and the scoring rule may be exposed, but this is a loss primarily to the credit bureau and there are no negative externalities involved.

Extension to Multiple Banks

In general, the credit bureau will need information from many (say m) banks (or other agencies).

The credit score of an individual can be represented as a weighted sum of sub scores from various banks (the bureau may or may not use equal weights u_i = 1 or u_i = 1/m for this purpose):

Total Score = u₁ subscore₁ + u₂ subscore₂ + ... + u_m subscore_m

where the u_j is the weight of bank j and subscore_j is the sub score computed using information only from bank j as follows:

subscore_j = w₁ x^j₁ + w₂ x^j₂ + ... + w_n x^j_n

where x^j_i is the i’th attribute of the individual at bank j.
Bank j can use homomorphic encryption to compute u_j subscore_j. We first define a set of modified weights v^j_i for attribute i for bank j as:

v^j_i = u_j w_i

and then let the bank compute a weighted sum exactly as in the one bank case but using weights v^j_i instead of w_i:

u_j subscore_j = v^j₁ x^j₁ + v^j₂ x^j₂ + ... + v^j_n x^j_n
The credit bureau adds up all the u_j subscore_j that it receives from various banks to find the credit score of the individual.
We can however get one further level of privacy in this case where the credit bureau is able to compute the total score of an individual without learning any of the subscore_j. If this extra privacy is desired, we modify the procedure as follows:
1. Bank j computes
  
  disguised_subscore_j = u_j subscore_j + r_j
  
  where r_j is a random number chosen by bank j. The bank communicates the disguised_subscore to the credit bureau. (Note for the geeks: Actually since the bank computes and communicates an encrypted form of this quantity homomorphically, it needs to encrypt r_j also. This is possible since we are using public key cryptography – the public key of the credit bureau is publicly available and anybody can encrypt using this key; but only the bureau can perform decrpytion because only it has the private key).
2. All the banks collectively compute the sum of all the r_j using secure multi party computation based on secret sharing methods which ensure that no bank learns the r_j of any other bank. The sum of all the r_j (let us call it sum_r) is communicated to the credit bureau.
3. The credit bureau computes the sum of all the disguised_subscore_j. From this result, it subtracts sum_r to get the correct total credit score.
At the end therefore:
1. The credit bureau knows the total credit score of the individual.
2. The credit bureau has not revealed either its scoring rule or the credit score of the individual.
3. The bank has not revealed any confidential information about the customer to the credit bureau: not even the sub score based on data in its possession.
The above procedure is repeated for each individual. The modified weights v^j_i would be the same for all individuals at the same bank, but x^j_i would of course vary from individual to individual. To be precise, we should write the i’th attribute of the k’th individual at the j’th bank as x^jk_i. The r_j (and therefore sum_r) should also ideally vary from individual to individual: strictly speaking, these are actually r^k_j and sum_r^k for individual k. Similarly, disguised_subscore_j should strictly speaking be disguised_subscore^k_j

Allowing the individual to verify all computations

How does an individual detect any errors in the credit score? How does an external auditor verify the computations for a sample of individuals?

The individual k would be entitled to receive a credit report from the credit bureau that includes (a) the unencrypted total credit score (total_score^k), (b) the encrypted disguised_subscore^k_j for all j, (c) the encrypted modified weights v^j_i for all i and j and (d) sum_r^k. Actually, (b), (c) and (d) should be publicly revealed by the credit bureau on its website because they do not leak any information.

The individual k would also be entitled to get two pieces of information from bank j: (a) the attributes x^jk_i for all i and (b) the random number r^k_j.

With this information, the individual k can verify the computation of the encrypted disguised_subscore^k_j for all j (using the same homomorphic encryption method used by the banks). The individual can also verify sum_r^k by adding up the r^k_j. Using the public key of the credit bureau, the individual can also encrypt total_score^k - sum_r^k and compare this with the encrypted sum obtained by adding up all the disguised_subscore^k_j homomorphically.

The same procedure would allow an auditor to verify the computation for any sample of individuals.

The careful reader might wonder how the individual can detect an attempt by a bank to falsify r^k_j. In that case, sum_r^k will not match the sum obtained by adding up the r^k_j, but how can the individual determine which bank is at fault? To alleviate this problem, each bank j would be required to construct a Merkle tree of the r^k_j (for all k) and publicly reveal the root hash of this Merkle tree. Individual k would then also be entitled to receive a path of hashes in the Merkle tree leading up to r^k_j. It is then impossible to falsify any of the r^k_j without falsifying the entire Merkle tree. Any reasonable audit procedure would detect a falsification of the entire Merkle tree. Depending on the setup, the auditor might also be able to audit (a sample of) the secure multi party computation of r^k_j directly by verifying a (sub) sample of the secret shares.

Conclusion

At the end, we would have built a secure credit bureau. A Equifax scale hacking of such a bureau would be of no concern to the public; it would be a loss only for the bureau itself. Mathematics gives us the tools required to do this. The question is whether we have the good sense and the will to use these tools. The principal obstacle might be that the credit bureau would have to earn its entire income by selling credit scores; it would not be able to sell personal information about the individual because it does not have that information. But this is a feature and not a bug.

Posted at 4:20 pm IST on Fri, 20 Oct 2017 permanent link

Categories: bond markets, fraud, risk management, technology

Comments