Baselight: Building the knowledge layer for humans and AI

Thanks for having us. I'm Henrique, co-founder of Finisterra Labs. We’re building Baselight, a universal data hub that provides AI with a high-resolution view of the world.

Language models have achieved incredible reasoning capabilities, but this reasoning happens through a blurry knowledge of the world: the one that’s encoded into text. Baselight upgrades LLMs and agents from lossy text to information-dense structured data, and provides answers that are explainable, reproducible, and auditable.

Let me show you why Baselight is the missing layer in the AI stack—and why now is the time for it.

There are a few problems with the current state of AI.

The first one is a training data wall. We have largely tapped out high-quality human-written text, and forecasts point to a complete exhaustion by 2028 if current trends continue. Simply adding more text won’t carry us forward anymore.

The second problem is that training costs have been growing exponentially. Since 2010, the amount of computation needed has been doubling roughly every 6 months, pushing the cost of frontier models to the billion-dollar range. At the same time, gains have begun to plateau. Future lift won’t come from simply scaling training data or parameters.

The final problem has to do with the input itself: text. Text has a low-resolution. It’s a compressed, lossy description of reality. Agents can perform more if, in addition to text, they have access to information-dense structured data with a source you can track and verify. It’s the difference between providing the model with a news article about inflation and providing the actual underlying CPI time series. The latter represents a much denser source of information. Baselight delivers this data at inference-time.

Let me illustrate this with a very practical example. We want to know how many wallets hold the Ripple USD stablecoin. First, we ask that question to a frontier LLM:

"How many wallets hold Ripple USD on the XRP Ledger?"

And what we get is an answer that sounds confident (~5,000 wallets) is factually wrong and filled with fluff.

On the other hand, when we rely on Baselight, we obtain an authoritative and correct answer (~35,000 wallets) filled with verifiable facts containing sources we can verify such as the dataset from which the data

The answer is:

Explainable and reproducible: We have access to the SQL queries used in the answer and can re-run them at will.
Verifiable: We can inspect the datasets used in the answer.

And that’s the gap—giving agents the data itself that they can find and access at inference-time.

The AI landscape is shifting. The race is no longer about building the biggest model. It’s about connecting models and agents to high-value, timely, structured data - the kind that lives in databases, behind APIs, and proprietary systems.

The giants are spending billions to lock this data into their own walled gardens. This creates an industry-wide fragmentation, forcing customers into a single silo and limiting AI’s potential.

Baselight was built to break this lock-in. We are not another silo; we want to be the universal data layer. We want to provide every agent with trusted data from any source, positioning ourselves at the center of this huge market.

The vision is for Baselight to be the universal structured data layer for humans and AI.

Near term, it’s a product: you can think of it as the copilot for structured data and the native database for agents. You ask a question; Baselight returns a trusted, instant answer you can compute on and cite.

But where we want to go long-term is an open, decentralized network that organizes the world’s structured data.

Suppliers publish high-quality datasets and reap rewards.
Processors execute queries.
Users and agents query a single high-resolution layer and get answers with receipts—sources, vintages, transforms—built in.

Long-term, any of these roles can be fulfilled by anyone. An open network where everyone can participate.

The goal is to create a flywheel with powerful network effects:

It starts with Discovery. As more data joins our network, it becomes the best place for users and AI agents to find valuable information. Better discovery leads to…
Queries. Users ask questions and get trusted answers. Crucially, every answer comes with a receipt that traces data to its source. These receipts enable..
Payouts. We can precisely attribute every answer to the data that powered it and automatically rewards the suppliers. This direct financial incentive drives…
Supply. The best data earns the most, attracting more high-quality, fresh datasets to our network, which brings us right back to making Discovery even more powerful.

Every turn of this flywheel compounds the network effects, making Baselight the default utility for provable answers. We’re already seeding this flywheel with a vast range of public data and are establishing pilots with design partners to accelerate it.

In short: a path from a great product to a public utility for facts.