Created on 1st March 2025
•
Data vendors can use this service as it allows them to upload their data to a private database while also earning payments from customers that use their dataset.
Customers who are in need of data for training their AI models or in need of an AI model can use our services to train a model as well as provision data that ML engineers are always eager to get their hands on.
Customers can provide a testing dataset as well as a desired accuracy threshold to run the trained model against to see if it suits their use case BEFORE paying for it.
We provide ZK Proofs that the accuracy was really met and that we truly did run inference on the testing dataset.
We then verify these proofs on ZKVerify, and submit the proofs to Hedera Consensus Service.
This is effectively a receipt for customers to show that they really did train their model on a dataset from us. A certificate of authenticity!
Like many P2P networks, we incentivize customers to give a chance to the new peer in the network by starting training costs at an EGREGIOUSLY LOW price. Afterwards, as datasets meet the accuracy threshold, their price increases according to the formula of our bounding curve in the factory smart contract. This leads to:
Data vendors upload their dataset to the chain of their choice
They are the admin of the bounding curve of the dataset, and can withdraw their earnings whenever they wish
Customers specify how they would like their model to be trained. Currently we support Decision Tree classifiers and the following parameters:
Customers also specify the accuracy they are willing to pay for on their uploaded test set.
They are then returned the encoded model, a ZK Proof of the inference, the attestation of the verified proof on ZKVerify (as well as the information to verify this on chain), and the transaction hash corresponding to the attestation on the Hedera Consesus Service.
We wanted to create something that encouraged price discovery and ML. Decentralized compute services already exist, and AI model services also exist. We originally wanted to implement influence functions after reading a paper by Anthropic and determine how to split up a vault given to us by a customer over multiple datasets that we used for training, but we did not have the GPUs to do this, nor a fair idea on how to determine what should be rewarded, as just changing a parameter more than a different dataset does not mean you helped towards the goal more.
We settled for handling 1 dataset at a time, and therefore decided that we should give consumers a way to sort these datasets by usefulness. Due to sybil concerns, we eventually settled at the idea of payment, and that naturally led to price discovery.
Considering the interests of the data vendors, we wanted to keep the datasets completely private, and instead offer a ZK Proof that the trained model really did meet or exceed some accuracy threshold decided by the user.
Key implementation details are mostly RISC0's proof of ML inference as well as verification of these proofs by ZKVerify
Returning back to the DeFi side, we chose a rather simple bonding curve, but know this can be fine tuned as much as a protocol wants. Feel free to experiment!
One thing we did give much thought was how much to make the initial fee. From our studies of BitTorrent (a P2P file sharing service), we learned that it is necessary to give a new peer in the network an advantage over existing nodes (to boost their chance of being "picked"), and therefore settled for a very low price.
We used countless RPCs to facilitate the communication between our backend and smart contracts
We used MetaMask as the wallet for the frontend to let the user easily access the datasets that were deployed to that chain
We used ZKVerifyJS to prove our ZKProof on chain
We used Hedera Consensus Service to post our proofs as immutable and timestamped logs
We used RISC0 to create a ZK proof
Our project's key features are price discovery, compute and ZK proofs to keep our data vendors' data private.
We are unable to do a ZK proof of the training of the model due to the costliness in terms of time and compute of ZK proofs. We look forward to the new developments in cryptography and both their applications to ML, and their verifiability on chain.
Only support Decistion Trees, but we plan to support more model in the future.
One other thing we wanted to do was to support the use of multiple datasets, and then use influence functions to determine how much we should pay each dataset (as reward for their contribution to model weights).
Could not spend too much time on configuring the bounding curve so we settled for a simple one. We plan on tuning this to make it more fair to vendors and customers.
We also currently sponsor all transactions, which is an unsustainable business model, and plan on making the user directly pay us for services.
Tracks Applied (5)
zkSync ∎
Hedera
Hedera
zkVerify Foundation