M

Macrolytics: Data Anonymisation & Query Service

The framework integrates directly with AAs to power seamless access to query & user-anonymous data requests. Integrated privacy models ensure that extracted sample data is unique and not re-traceable

The problem Macrolytics: Data Anonymisation & Query Service solves

The AA does not currently have a defined method for bulk access to data. The access to data is more critical than it’s form. For the AA to be both privacy conscious & provide federated access to consented data, it must be user-anonymised.

This will eventually open up doors for better modelling & insightful research etc and pave the way for a more sustainable lending ecosystem.

We built the project on the assumption that a new ‘consent-type: Anonymous’ will be facilitated in the consent-artefact

The functionalities we built:

A Query Layer

  • This layer supports a subset of SQL operations and is powered by a custom query parser which validates all requests made to our APIs. The operations supported now should cover most of the functionalities required in the initial days of the AA framework. Support for more operations can be added as and when the need arises.
  • The custom query parser validates incoming requests, associated consent types & rules of comparison. The response of the query framework returns an error if any of these parameters are in conflict with the underlying data policies
  • The following types are supported:
    1. Query: Returns validated results for data-requests against a single user:
    2. Anonymous: Returns results for bulk data requests made by FIUs or other entities allowed by regulators to query user-anonymous data.

Integrated Privacy Models

  • Algorithms ensure that numeric values returned are at least k-anonymous & partially l-diverse. This ensures that data is not re-identifiable & data is in a workable format
  • With more visibility into the actual data points available, we plan to work on principles like t-closeness to strengthen anonymisation
  • These mechanisms ensure that data values returned are always representative of the general distribution

Challenges we ran into

The primary areas of confusion that came up while trying to build a privacy framework compatible for model building are stated below:

  • Getting anonymised data into the system:
    We propose that to get such data into the system, AA collective must consider introducing a new type of consent, one specific to anonymisation. Permissions to consume anonymised-data could be a defacto consent secured from the user by FIUs/AAs.

  • Getting anonymisation models to work:
    The Anonymisation models that we have described in our problem statement are meant to work with the full dataset.

    Applying these directly on model training data could lead to the following issue:
    For models built using the anonymised data to work in a production setting for predictive purposes, the anonymisation process in itself has to be repeatable.

    Our solution: The system will require a method to save anonymisation-settings used for a data pull by a requesting entity for a pre-defined interval. A subsequent request by the same entity for the same set of variables will use the same settings to ensure that data is compatible with their existing models. This has been factored into the proposed architecture which we will build going ahead.

Technologies used

Discussion