Why We Built Our Own AI Infrastructure (And Why It Matters for Your Data)
Your Financial Data Probably Leaves Your Country Every Time You Use “AI”
When your accounting software says “AI-powered,” here’s what usually happens behind the scenes: your invoice data, bank statements, and employee salary information gets serialized into a JSON payload, sent over HTTPS to a third-party API (typically OpenAI, Google, or AWS Bedrock), processed on shared GPU infrastructure in a data center you don’t control, and the result comes back.
Your data might cross the Atlantic. It might be processed on the same GPU cluster handling a competitor’s data. It might be retained for model training (unless you’ve negotiated an enterprise agreement that specifically excludes this). You probably don’t know, because you probably didn’t ask.
For a consumer app, this is fine. For a business processing invoices with tax IDs, bank account numbers, and payroll data — in jurisdictions with GDPR, the forthcoming AI Act, and country-specific data residency requirements — this architecture has a shelf life.
The API Wrapper Problem
The vast majority of “AI-powered” business tools in 2026 are API wrappers. The product runs on standard cloud infrastructure (AWS, GCP, Azure), and when it needs intelligence — understanding a PDF, classifying a transaction, answering a question — it makes an API call to a language model provider.
This works. It’s fast to build. It lets small teams ship AI features without hiring ML engineers or buying GPUs. And for many use cases, it’s perfectly adequate.
But it creates three structural problems for business software:
1. Data Transit
Every API call sends your data to a third party. For a single invoice OCR operation, that means the vendor’s API receives the full document: supplier name, your company details, line items, tax IDs, bank accounts, amounts. Multiply this by hundreds of invoices per month, plus bank reconciliation data, payroll processing, and tax calculations. Over a year, the third-party API has processed a remarkably complete picture of your business finances.
2. Inference Economics
API pricing works on tokens. Every request costs money proportional to its length. When you process a 3-page invoice, you’re paying for the tokens to send the image, process it, and receive the structured output. At scale — hundreds of invoices, daily bank reconciliation, continuous tax monitoring — the per-token cost becomes a significant portion of the product’s operating expenses.
Companies using API wrappers face a margin squeeze: as customers use more AI features, costs go up linearly. This is why most AI-powered tools limit AI interactions, charge extra for “AI credits,” or throttle usage. The pricing reflects the architecture.
3. Latency and Availability
Your product’s reliability is chained to the API provider’s uptime. If OpenAI has a bad day (and they have them — check their status page history), your “AI-powered” accounting software stops being AI-powered. You’re also subject to rate limits, which means your busy season is capped by someone else’s capacity planning.
The Alternative: Own Your Inference
A different architecture is emerging, particularly among companies backed by NVIDIA’s Inception program: run your own inference on dedicated GPU infrastructure.
Here’s what this looks like in practice:
Hardware: Dedicated GPU servers (NVIDIA B200 or similar) hosted in the jurisdiction where your customers operate. For a European business tool, this means servers physically located in the EU.
Inference Engine: Open-source inference servers like vLLM that run language models at production scale, with proper batching, memory management, and concurrent request handling.
Models: A combination of commercial APIs (for the best frontier reasoning models) and self-hosted models (for high-volume, latency-sensitive tasks like OCR classification, intent routing, and document extraction).
Routing: A hybrid approach where each task is directed to the right model: complex multi-step reasoning goes to the frontier model, while routine classification and extraction tasks run on the self-hosted infrastructure.
The practical benefits:
Data stays put. Invoice OCR, bank reconciliation matching, payroll calculations — the data never leaves your infrastructure. For GDPR compliance and data residency requirements, this is the simplest possible architecture: the data doesn’t move.
Predictable costs. GPU servers have a fixed monthly cost. Whether you process 100 invoices or 10,000, the infrastructure cost is the same. This means AI features don’t have usage limits, “AI credit” upsells, or throttling. The AI just works, all the time.
No external dependencies for core operations. If a cloud API has an outage, your locally-hosted models keep running. The frontier model API might be unavailable for complex reasoning tasks, but document processing, classification, and routine operations continue uninterrupted.
Customization. Self-hosted models can be fine-tuned on domain-specific data. A model that has processed 100,000 Spanish invoices develops an understanding of Spanish tax IDs (NIF/CIF), VAT regimes (IVA/IGIC/IPSI), and common accounting patterns that a general-purpose model doesn’t have.
What Data Sovereignty Actually Means
“Data sovereignty” has become a marketing term. Every cloud vendor claims it. But in the context of AI for business, it has a specific, testable meaning:
Can you answer “where was this data processed?” for every AI operation?
If your system uses an external API, the honest answer is: “it was processed on shared infrastructure operated by [vendor], located in [region], under [vendor]‘s data processing agreement.” You trust the vendor’s DPA, but you don’t control the infrastructure.
If your system uses own infrastructure, the answer is: “it was processed on server [X] at IP [Y] in datacenter [Z] in [country].” You control the infrastructure, and you can prove it.
For businesses subject to regulatory audits — which includes anyone handling financial data in the EU — the second answer is much simpler to defend.
The GDPR Angle
GDPR requires that personal data processing has a lawful basis, that data transfers outside the EEA have appropriate safeguards, and that you can demonstrate compliance (the “accountability principle”).
Financial data — invoices, payroll, bank statements — contains personal data. Employee names, tax IDs, bank accounts, salaries. When an AI system processes this data, that processing must comply with GDPR.
Using a US-based API for this processing creates a data transfer to a third country. Yes, there are mechanisms for this (Data Privacy Framework, Standard Contractual Clauses). But each mechanism adds complexity, requires documentation, and creates a potential point of regulatory challenge.
Processing everything locally in the EU eliminates this entire category of compliance work. Not because it’s impossible to use external APIs legally, but because it’s simpler not to need to.
The AI Act Angle
The EU AI Act, which is being implemented in phases through 2026-2027, introduces requirements for AI systems used in specific domains. Financial services is classified as high-risk. This means additional requirements for transparency, human oversight, documentation, and risk management.
Owning your inference infrastructure makes compliance with these requirements more straightforward: you control the models, you control the data pipeline, you can document exactly how decisions are made, and you can implement the required human oversight mechanisms (like confirmation gates on financial operations).
Questions to Ask Your AI Vendor
Whether you’re evaluating a new tool or auditing your current one, here are five questions that reveal the architecture:
1. “Where is my data processed when the AI runs?” If the answer involves a third-party API (OpenAI, Google Cloud AI, Azure OpenAI, AWS Bedrock), your data leaves the product’s infrastructure. This isn’t inherently bad, but you should know.
2. “What happens if the AI provider has an outage?” If the answer is “AI features stop working,” the product is an API wrapper. If the answer is “core operations continue on our infrastructure, and complex reasoning degrades gracefully,” the product owns its inference pipeline.
3. “Is there a per-use limit on AI features?” If the answer involves “AI credits” or usage tiers, the product’s costs scale with API usage, and they’re passing that cost to you. Products with own infrastructure have fixed costs and don’t need usage limits.
4. “Can you tell me the physical location of the servers processing my data?” This should have a specific answer: a city, a datacenter provider, a region. “Our cloud provider’s European region” is less specific than “Hetzner datacenter in Falkenstein, Germany.”
5. “How do you handle model updates?” API-dependent products are at the mercy of provider model changes. If OpenAI deprecates a model version, the product must adapt. Products with own infrastructure control when and how models are updated, and can maintain older versions if stability is critical.
The Hybrid Approach
In practice, the best architecture in 2026 is hybrid. Not everything needs to run locally. Complex reasoning tasks — multi-step planning, nuanced conversational AI, novel problem-solving — still benefit from frontier models accessed via API. These interactions tend to involve less sensitive data (the user’s question, not the raw financial data) and can be designed to minimize data exposure.
High-volume, data-sensitive tasks — invoice OCR, bank transaction classification, document extraction, routine compliance checks — are better candidates for local inference. These tasks process the most sensitive data, happen the most frequently, and benefit the most from custom fine-tuning.
The intelligent routing between these two layers — sending each task to the right model on the right infrastructure — is what separates a thoughtful architecture from a brute-force one.
What We’re Building
At Odiverse, we’re an NVIDIA Inception member building this hybrid architecture for business AI. Our agent, Odi, currently uses Anthropic’s Claude for conversational AI and complex reasoning, while preparing our own GPU infrastructure (NVIDIA B200s via vLLM) for high-volume inference tasks.
Our servers are in Europe. Our data processing happens within the EU. Every write operation requires human confirmation. Every action has an immutable audit trail. And we’re building toward a future where the most data-sensitive operations — invoice OCR, bank reconciliation matching, payroll calculations — run entirely on our own GPUs, with zero external data transit.
This isn’t the cheapest architecture to build. But it’s the right one for a product that handles your business’s financial data. And as data sovereignty regulations tighten across Europe — and increasingly across the Americas and Asia — it’s the architecture that will age the best.
The question for any business evaluating AI tools in 2026 isn’t “does it have AI?” It’s “where does the AI run, who controls it, and where does my data go?”
If your vendor can’t answer those questions clearly, you might want to find one that can.