In a new video, Chainlink Advisor and former Google AI Lead Laurence Moroney and Chainlink Chief Scientist Ari Juels examine how privacy-preserving oracle systems designed for blockchain applications can also be leveraged to build protected pipelines or “props” for permitting secure use of deep web data to advance AI training.
In Props for Machine-Learning Security, co-authored with Farinaz Koushanfar, Juels, a professor of computer science at Cornell Tech, proposes props as a solution to one of the biggest obstacles in effectively training machine learning models: lack of quality data. He argues that with secure access to the deep web’s vast sources of private data, props can accelerate the training and beneficial use of AI systems.
“If we could tap this resource, it would be a real boon for machine learning AI,” he said.
Another benefit of privacy-preserving oracles is ensuring that data is trustworthy and authenticated, which reduces opportunities for manipulation and fraud. Ultimately, the technology could be used to create a system in which people are sufficiently incentivized and rewarded for contributing private data, such as health or legal records, to AI models with cryptographic assurance their data will not be leaked or traced back to them.
“One of the benefits of using deep web data is that you can get very high-quality reservoirs or repositories of the stuff that would otherwise be unavailable,” Juels explained.
A key advantage of using deep web data versus synthetic data to train a medical diagnostic machine learning model, for example, is the ability to prevent rare conditions from being further overlooked.
“By the very fact that the statistical curve is reinforced when you make synthetic data, the outliers tend to be even more excluded,” Moroney explained.
“There are potentially some diseases that a very small percentage of the population has. They would be a statistical aberration. And they may not show up in synthetic data for a machine learning model. But if somebody has that disease and wants to share their data and they opt in to do it, they’re not only helping themselves, but they’re also potentially helping other people like them who lie outside the bell curve.”
Juels explained two types of privacy-preserving oracle systems. The first is a trusted execution environment (TEE)-based oracle called Town Crier. The second is a cryptographic privacy-preserving oracle called DECO.
Through the use of zero-knowledge proofs and an oracle, DECO allows users to authenticate personal information from an API or website without revealing sensitive information onchain or to the oracle. By verifying that a certain value exceeds an established threshold via cryptographic proof, DECO enables use cases such as confirming borrowers’ creditworthiness while protecting their personal information.
The Chainlink DECO Sandbox enables financial institutions and enterprises to quickly explore how to streamline user onboarding while maintaining data privacy and ensuring data provenance. Web3 developers can also use the sandbox to experiment with novel privacy-preserving use cases, such as how institutional investors could participate in the onchain economy via their offchain identities and credit histories.
“I would love to see what people start to think about and what people start doing,” Moroney said.
“I really think there are so many other data troves out there where people have great domain knowledge, and the possibility for you to open them up in a secure and trusted way.”
Listen to the full conversation.

