Encrypted Search

Encrypted Search

Encrypted search engine for text, JSON, and IPFS data. Use Lens to create a shared encryption key to share sensitive data with family, friends, and trusted foes.

The problem Encrypted Search solves

Sensitive data is often stored in unsafe ways to allow for server-side computation, search, and other operations.
The system we built allows a user to upload data (or input an IPFS hash), encrypt it locally, and store the encrypted data in a centralized search engine. Although the data is stored centrally, the user can rest assured that his or her information is safe and private. Additionally, the user can share this data with a select group of people by sharing a shared symmetric key. As a proof of concept, this is done by using asymmetric ciphers and the TSL mechanism to share private keys. Down the line, we can utilize NuCypher's Threshold Decryption which allows the data owner to dynamically define access to a shared symmetric key. Lastly, we utilized the Lens social graph to connect users and use their public key to create a shared symmetric key.

This proof-of-concept encrypted search shows how a censorship-resistant and privacy-preserving search index can be built. Importantly, this search engine can be entirely centralized as it is not trusted with any sensitive information. Additionally, we show that sensitive data can be shared successfully with friends using a public social graph. In the future, encrypted search engines and databases can help more applications adopt client-side encryption and still ensure powerful user experiences.

Challenges we ran into

Our first attempt build AGI over the weekend failed. Unable to cope with this crushing defeat, we drank 10 Club Mate's in a row and proceeded to try to tackle encrypted search.

One challenge was to discover how in the first place encrypted search was feasible because if we were to encrypt the entire document as is, it would be impossible to search for individual words or phrases. We learned how to encrypt files locally and implement a blind index by first tokenizing the input data and encrypting each token individually. Next, we spun up and deployed an elasticsearch instance where we could build search indexes of the encrypted data. In order to encrypt the data, we initially thought we could use the NuCypher implementation to both encrypt the tokens and share that policy with others. Our vision was to create a community-based encryption key by utilizing the Lens social graph where we could share the public keys as a Lens publication. After many hours of banging our heads against the wall, we came to realize that there was no way to run deterministic encryption within NuCypher. Being that we need to re-encrypt the text before searching against elasticsearch, we could not use the randomness that was part of NuCypher.

We implemented a deterministic key (yet sufficiently random to everyone else) by signing a message through Metamask. In order to share sensitive data, we utilized the security mechanism used in TSL (asymmetric or public-key encryption) to share the search key by encrypting Alice's private search key together with Bob's public key. Then, Bob can decrypt this message using his private key and retrieve the necessary search key. The Lens integration is built, but for the demo, we used another pair of keys for the TSL handshake.
Finally, we learned a LOT about encryption and had to understand the implications of our design choices on the security of the system. Attack vectors such as leakage, rainbow attacks, and others were outside the scope of the POC.

Discussion