Protecting Privacy with Translucent Databases

… In Translucent Databases, Wayner extends this concept of hashing in new and important ways. For example, what if a police department needs to build a database of sexual-assault victims that lets them identify trends but hides personal information? You could use a translucent database where the first column is the hash of the victim’s name, and the second column is a hash of their full address, and the third column is a hash of their block and street. You can now group incidents together by grouping entries with identical block hashes; you can see if the incidents refer to the same person by checking to see if those hashes are different.

Wayner’s approach makes it possible to let victims update their records without giving anybody else the ability to search by a person’s name. You do this by adding a password to the victim’s name — a password known to the victim and nobody else.

For example, if you were to use the MD5 hash function, you could key a victim’s report with the value of MD5 (“J. Smith/color4”) where “color4” is Smith’s password. If Smith remembers that her password is “color4”, then she will be able to update her database entry in the future — perhaps to tell the database administrators that her perpetrator has been caught. If there is a concern that victims might forget their passwords, the database can have additional columns that are protected with other passwords, known to other people. For example, a second column where the password is known only to the intake officer. By creating multiple keys using different combinations of data, it’s possible to protect a translucent database against browsing while simultaneously providing for people’s natural tendency to forget critical pieces of information…

http://www.oreillynet.com/pub/a/network/2002/08/02/simson.html

http://www.wayner.org/books/td/

Leave a Reply