Engineering · 2026-05-12 · Laavod team

How we built a workspace for every AI model

Behind the architecture decisions that let Laavod connect to 247+ models without lock-in to any of them. Mastra, the Vercel AI Gateway, and the abstraction layer that ties them together.

How we built a workspace for every AI model

Most AI platforms are locked to one model provider. ChatGPT is OpenAI. Claude is Anthropic. Copilot is Microsoft. Notion AI uses whatever provider Notion picked for you. The user does not get to choose.

We took a different bet. Laavod connects to 247+ models across every major provider, plus a long tail of niche models, with zero markup if you bring your own keys. This post is about how we did it without ending up with an unmaintainable abstraction layer that ships nothing on time.

The wrong way to do this

When we started, the obvious move was to write our own integration for every model provider. OpenAI has its API. Anthropic has its API. Google has Vertex. Mistral has its own. Each one has different request formats, different streaming protocols, different rate limits, different ways of handling tool calls.

Writing integrations one by one would have been fine for the first three or four providers. By provider seven, the maintenance burden compounds. By provider twenty, you spend more time keeping integrations current than building new features. By the time you have 247+ models, you have rewritten the integration layer twice and given up.

The alternative is to pick a model abstraction library and let someone else handle the per-provider work. The risk: you couple yourself to a library that might disappear or change direction.

We picked the AI SDK from Vercel for the model abstraction. We use the Vercel AI Gateway for the actual provider routing.

What the Vercel AI Gateway does

The Gateway is a thin proxy in front of every model provider. You send a request to one endpoint. The Gateway figures out which provider to call, formats the request correctly, handles streaming, retries, fallbacks, and gives you back a unified response.

Three things made this the right call for us.

First, it is provider-agnostic at the protocol level. Switching from Claude to GPT-5 for a specific task is a one-line change. The agent code does not care which provider runs the inference.

Second, it supports BYOK natively. The Gateway can route requests through Laavod's account (we mark up the credits) or through a customer's own provider account (we add zero markup). The same code path works for both.

Third, it handles the long tail. We do not have to manually integrate every model. When DeepSeek ships a new version or xAI releases Grok 4 Reasoning, the Gateway adds it and we get access for free.

The Gateway costs us nothing per request beyond what we would pay the providers anyway. Vercel makes its money on hosting, not on token markup. Our incentives align.

What Mastra does

Above the Gateway, we needed something that handles the parts of AI work that are not just model calls. Memory. Retrieval-augmented generation. Workflows. Evaluations. Tools. The MCP protocol.

We picked Mastra for this layer. Mastra is a TypeScript-native agent framework. It is open source, production-proven (Replit, PayPal, and Adobe use it internally), and built around the AI SDK.

Mastra gives us four things our agents need.

Memory. Conversations persist across sessions. Agents remember context from previous interactions. The memory model is pluggable: short-term in-memory for fast iteration, vector-backed for long-running agents that need to recall events from weeks ago.

RAG. Knowledge bases get chunked, embedded, stored, and retrieved with one API. We did not have to write our own vector search layer.

Workflows. Multi-step agent runs are first-class. Branching, parallel execution, error handling, retries. The workflow DSL is just TypeScript, not a config DSL, so engineers can use the language features they already know.

Evaluations. Built-in evals for catching regressions when we change a model or update a prompt. Critical when you support 247+ models and need to know if a change in Claude Opus broke an existing agent.

The reason we did not build any of this ourselves: it would have taken eighteen months and we would have ended up with a worse version of what Mastra already shipped.

The abstraction we wrote ourselves

We did write one abstraction layer of our own: the LaavodAgent interface. Every agent in the product, regardless of which model it runs on, exposes the same set of methods. Run, stream, getMemory, getTools, getKnowledge, evaluate.

The implementation under the interface is mostly a thin wrapper around Mastra. But the interface lets us swap implementations later. If Mastra's roadmap diverges from ours, or if a better framework appears, we can replace the implementation without changing any consumer code.

This is the same pattern Stripe uses for payment processors. The Stripe SDK abstracts away the underlying card network. Stripe can change which processor handles a transaction without you knowing or caring. The same logic applies to AI: the agent abstraction should be stable even when the model layer underneath it churns.

The result

Today, when a Laavod user picks a model in the workspace dropdown, here is what actually happens:

The user picks (say) Claude Opus 4.6 from a list of 247+ options.
The agent layer sends a unified request to Mastra.
Mastra adds memory context, retrieves relevant knowledge from the user's knowledge bases, attaches the available tools, formats the request.
The request goes to the Vercel AI Gateway.
The Gateway routes the request to Anthropic, using either Laavod's account (credits) or the user's BYOK key (zero markup).
The response streams back through the same chain, with token usage logged, credits debited (if applicable), and the conversation state updated.

The user sees a response in their chat. They do not see four layers of abstraction. But the abstractions are why they can switch from Claude to GPT-5 to Gemini to Mistral and back, all in the same conversation, without losing context.

The structural moat

Here is what other AI platforms cannot copy.

ChatGPT cannot honestly recommend Claude, because Anthropic is a competitor. Claude cannot honestly recommend GPT-5, because OpenAI is a competitor. Microsoft Copilot is bound to OpenAI by commercial agreement. Notion AI does not let you choose because Notion has their own preferred provider deal.

Our economics are different. We do not sell models. We do not have a foundation model. We have no reason to prefer one provider over another. When a new model is better for a specific task, we route to it. When a user has a negotiated rate with one provider, they bring those keys and use them.

This is not idealism. It is architecture. The Vercel AI Gateway is the routing layer. Mastra is the agent layer. Our code is the workspace layer. Each layer is independently swappable. No layer creates lock-in to a specific provider.

An incumbent who wanted to copy this would have to sell their competitors' models. Their existing business model prevents it. Ours requires it.

What this means for what we ship next

The architecture above is the foundation for everything we are building.

The marketplace will work because every published asset (agent, workflow, prompt pack, knowledge base) carries its model preferences but can be re-rooted at run time to whatever provider the subscriber prefers.

The agent framework will scale because Mastra handles the heavy lifting of memory and RAG. We can ship new agent types in days, not months.

The BYOK story will keep working because the Gateway already supports it natively. Adding a new provider does not require us to redesign anything.

The technology choices we made in the first six months will compound for the next five years. That is the most important architectural decision we have made so far, and the reason we feel good about the bet on model-agnostic infrastructure.

Start free → · Read about BYOK → · Browse models →

Ready to build?

How we built a workspace for every AI model

The wrong way to do this

What the Vercel AI Gateway does

What Mastra does

The abstraction we wrote ourselves

The result

The structural moat

What this means for what we ship next