At a meeting with a production director, a question comes up that we see in almost every company thinking about AI today: "Do we buy an off-the-shelf tool, or do we have to build it ourselves?" The question sounds simple, but it actually binds together three distinct decisions — about differentiation, about data, and about costs over time. Anyone who answers it in the first fifteen minutes of the meeting usually gets it wrong.
This article offers a framework we have validated in practice across dozens of deployments. It is not a religious war between "buy everything" and "build everything" — both are extremes, both are mistakes. The point is knowing which layer of each specific solution to buy, which to build, and where the boundary lies beyond which one or the other stops paying off.
The foundation of the framework — three layers of every AI solution
Every production AI solution can be divided into three layers:
- 1.Model and inference — the LLM, embedding model, serving infrastructure. This is a commodity. OpenAI, Anthropic, Google, or open-weight models such as
Qwen 3,Mistral,Llama 4— all provide a solid foundation that cost hundreds of millions of dollars to develop, and you can get it for a fraction of that price. - 2.Orchestration and retrieval — the RAG pipeline, agent logic, memory, tools, guardrails. This is the layer where output quality is decided. It is partly commoditised (frameworks like
LangGraph,LlamaIndex, and vector databases likeQdrantare open-source and mature), but the specifics of your deployment — your data, your processes, your edge cases — demand custom work. - 3.Domain layer — prompts, datasets, fine-tuning, evaluations, UI, integration with your systems. This is where differentiation is created. Nobody else has your production data, your SOP documents, or your customer histories. You cannot buy this layer — you can only build it.
In short: buy layers 1 and part of 2, build layer 3 and the remainder of 2. The problem arises when a company buys layer 1 from a vendor who wraps layer 2 and layer 3 into a proprietary platform as well — and the company does not realise what it is signing up for.
When to buy
Buying makes sense when the use case is commodity — meaning that fifty other companies face the same problem and a mature market of solutions exists. Examples:
- Customer support with FAQ (Zendesk AI, Intercom, Freshdesk AI) — a standard task with ready-made integrations and fast onboarding.
- Meeting summarisation and transcription (Otter.ai, Fireflies, Microsoft Copilot) — no differentiating value, speed of delivery is the key factor.
- Coding assistant for the team (GitHub Copilot, Cursor, Codeium) — a general use case where individual fine-tuning would bring marginal improvement at disproportionate cost.
- First-round HR screening (several mature platforms exist) — a commodity problem, a regulated market, ready-made compliance.
Beyond commodity use cases: buy if speed to production matters more than performance. In some situations an 80% solution available tomorrow is better than a 95% solution available in eight months.
The final argument for buying: if your company does not have — and in the near future will not have — an AI team with the necessary competencies, maintaining your own solution will cost you more than a SaaS subscription, both in direct cost and in unreliability.
When to build
Building makes sense in three situations:
Differentiation through data. If you have data nobody else has — production records from machines, claims histories, internal technical standards, measurement results — and if that data can be the source of better performance than your competitors, you must build. An off-the-shelf solution will not integrate this data in a way that gives you an edge; when you buy it you are paying for generic quality. Fine-tuning or RAG over your own data turns a generic model into a domain specialist — but that requires your own work.
Security and regulatory requirements. If you operate in an industry where data may not leave the network (healthcare, energy, defence, financial institutions with NDA data), SaaS solutions are simply off the table. The answer here is an on-prem LLM — a locally deployed open-weight model served with vLLM or Ollama, where inference runs on your hardware and no token leaves the network. This is not merely a technical choice — it is a compliance requirement.
Process specifics that no off-the-shelf product will support. If your use case includes non-standard steps — for example, integration with a legacy SCADA system, processing a proprietary documentation format, or a multi-step agent workflow that mirrors your specific production processes — no ready-made platform will cover it without extensive customisation. And that customisation will lead you to the same volume of work as a custom solution, but with someone else's code underneath it.
The hybrid model — the reality of most production deployments
In practice we have rarely seen a pure build or a pure buy deployment. The typical architecture that works:
- Buy: LLM API (Claude Sonnet, GPT, Gemini Flash) or an open-weight model served via
vLLMon your own server; a vector database (Qdrantis the de-facto standard for the Slovak market — EU-hosted, Apache 2.0); an embedding model (the open-weightBGEfamily is production-proven). - Build: A RAG pipeline with retrieval strategies specific to your document types; a prompt layer that reflects your company's processes and terminology; fine-tuning on domain vocabulary where the quality difference is measurable; integration with ERP/SCADA/MES systems; evaluation and performance monitoring in production.
This hybrid approach gives you the speed of commercial layers (the model and the database are ready on day one) and the differentiation of your own layers (domain logic stays yours). It also reduces lock-in risk — if a better model ships tomorrow, you swap the API call without rewriting the entire solution.
Total cost of ownership — where the calculation shifts
The most common mistake in the build vs buy decision is comparing the price of a SaaS subscription with the one-off cost of development. The correct calculation must cover the full total cost of ownership (TCO) over 3–5 years:
SaaS / buy TCO: - Monthly subscription (scales with number of users or data volume — watch this) - Onboarding and integration (rarely free) - Disruption from API changes or terms-of-service changes (we have seen pricing raised 40–80% at contract renewal) - Hidden costs: data leaves the company → privacy risk → potential compliance costs
Build TCO: - Initial development (typically the dominant line item in year 1) - Hardware if on-prem (GPU server — roughly €15–60k for a production deployment depending on requirements, amortised typically over 3 years) - Personnel costs for maintenance and iteration (not zero — calculate realistically) - Dependency on internal know-how (a key engineer leaving = risk)
The key point: SaaS looks attractive in year 1; build pays off from year 2–3 onwards. If the use case is temporary (a pilot project, testing a hypothesis, a seasonal need), buy. If it is strategic and long-term, build typically has a lower TCO and higher control.
Lock-in — the risk that gets underestimated
Vendor lock-in has three forms in the AI context that are worse than classic software lock-in:
Data lock-in. When your company's data (documents, histories, annotations, feedback) lives exclusively on the vendor's platform, migration is painful to impossible. Before buying, always verify: can I export 100% of my data in a standard format? If not, you are in lock-in from day one.
Model lock-in. If you have built prompt logic, a fine-tuning dataset, or evaluations for one specific model (for example a GPT-4 class model), migrating to a different model requires rework even if the new model is better. The solution: an abstraction layer in your orchestration where the model is a configuration value, not a hardcoded dependency.
Integration lock-in. Some platforms offer connectors to your systems — ERP, CRM, SCADA. When those connectors are proprietary and undocumented, you can only replace the platform at the cost of rewriting all your integrations. Always prefer open APIs and standard protocols.
The good news: open-weight models (Llama 4, Qwen 3, Mistral — most under Apache 2.0 or a comparable commercial licence) have dramatically reduced model lock-in over the last two years. Frontier-level performance is achievable locally without being tied to any specific provider.
Decision map — 5 questions before you decide
Before making a final decision, work through these five questions as a team:
- 1.Is the use case commodity? If two dozen companies in the same industry solve the same problem in the same way, an off-the-shelf solution is likely more efficient.
- 2.Is our data a source of differentiation? If yes, you must build — an off-the-shelf product will not process it in a way that gives you an advantage.
- 3.Is the data allowed to leave the network? If not, build/on-prem is the only option.
- 4.Do we have (or can we secure) a team to build and maintain it? Building without a competent team is worse than buying — team composition for an AI project is a separate topic that needs to be addressed in parallel.
- 5.How long do we plan to run this solution? Under 12 months, or if the use case is uncertain → buy. 2+ years with clear results → build or hybrid will come out ahead.
When the answers to these questions do not produce a clear winner, it is usually a hybrid case — buy the foundational layers, build the domain specialisation.
Typical mistakes we see
Buying the entire stack from one vendor without verifying whether each layer is genuinely best-in-class. Platforms that do everything do nothing excellently. An increasing number of successful production deployments assemble components from different providers: model API from one side, open-source vector database, custom orchestration.
Building without a defined use case. "We want AI, so we'll build it" is an expedition without a map. The majority of projects we have seen fail had no success metric defined before the start — and therefore no way to know whether what they were building had value. ROI on AI projects must be measured from day one.
Underestimating the cost of data. Before you can build, you need data — cleaned, structured, available in a format an LLM can process. According to the Cisco AI Readiness Index, only ~34% of companies rate their data readiness as sufficient. If you are in the other 66%, the data pipeline is your first project, not AI itself.
Ignoring the EU AI Act. From August 2026, specific obligations apply to companies deploying AI systems in regulated contexts. If you are buying a platform, check whether the vendor provides compliance documentation. If you are building, the compliance documentation is your responsibility. Ignoring this today may mean reworking your solution later.
Frequently asked questions
Does it make sense to test an off-the-shelf solution as a proof of concept and then replace it with a custom build?
Yes, but with one condition: the pilot must test your specific use case, not a generic demo. If a pilot with a bought solution shows that the use case has value, you have evidence for the investment in a custom build. What matters is that the pilot architecture separates data logic from the platform — migration is then much easier.
What does "open-weight model" mean from a licensing and commercial-use perspective?
Models such as Qwen 3 or Mistral are released under the Apache 2.0 licence — you can use them commercially without licence fees. Llama 4 has its own licence (free commercial use below a certain monthly active user threshold). Always verify the current licence terms for the specific model and version before deployment.
Is on-prem deployment realistic for a company without a large IT team?
Yes, if the use case is well-scoped. A single GPU server running Ollama with a smaller open-weight model (for example Qwen 3 8B or Phi-4 14B) can be deployed by an experienced engineer in a day. A production deployment with high availability, monitoring, and CI/CD is more demanding. For a company without an AI team, the right choice is usually a managed on-prem solution with external support, not self-managed infrastructure.
When is fine-tuning part of a "build" strategy and when is it not?
Fine-tuning makes sense when you want to specialise a model on your language register, terminology, or output format — and when you have enough high-quality training data (roughly 5,000+ paired examples for basic SFT). If you do not have the data, or if the problem can be solved well with a properly designed RAG pipeline, fine-tuning is premature optimisation. More on this decision in the article RAG vs fine-tuning.
What is the most common reason build projects run over budget?
From our experience: underestimating evaluation and iteration. Building the first version takes a predictable amount of time — but measuring whether it works, identifying where it fails, and fixing those failures takes equally long again. Projects that account for this from the start deliver on time and on budget. Projects that assume the first version will be production-ready do not.
*MP Industrial Solutions helps companies navigate the build vs buy decision with hard numbers — from use-case mapping through TCO calculations to the first production deployment. If you are facing this decision, we are happy to assess your specific situation together.*
