Every second CTO is asking the same question today: "Where do we start with AI so it isn't just another wasted investment?" The answer isn't in the technology. It's in the discipline of the first three months — what you map, what you refuse, who you hold accountable, and how you measure results. This article is a decision framework for the first 90 days, not a technical installation guide.
Most sources cite that 75–95 % of AI projects fail to reach their planned business value. The methodology behind those studies may be imperfect, but the trend is consistent: pilots launch, die after 3–6 months, and value never arrives. The root cause is almost always the same — not bad technology, but missing preparation before the first import openai.
Days 1–30: Map the terrain before you break ground
Use-case inventory, not a shopping list
The first month is not about choosing technology. It is about understanding where your company loses time and money on repetitive cognitive tasks — reading documents, answering the same questions, extracting data from forms, writing standard reports.
A fast method: ask 5–10 people from different departments to write down on paper what they do every week that is repetitive and what frustrates them. Most of what they list will be categorically different from "we want a chatbot." It will include: manually re-entering customer emails into the CRM, hunting for answers in technical documentation, preparing materials for price quotations, and checking delivery notes.
Rate each use case on two axes: frequency (how often it occurs) and value (what one instance costs — time × hourly rate × number of occurrences per month). This is not an academic exercise — it is the foundation for the ROI conversation you will have with leadership in 60 days.
Distinction: chatbot, copilot, or agent
Before choosing a tool, clarify what you actually need. Chatbots, copilots, and agents are different architectural patterns with different costs and complexity.
A chatbot answers questions — suitable for customer support or internal FAQ. The simplest to deploy, the lowest expectations.
A copilot assists a worker inside their own tool — inside a manufacturing MES system, an ERP, or documentation. The worker makes decisions; AI suggests and prepares. The typical first step for industrial companies.
An agent plans, calls tools, and executes multi-step actions autonomously. More powerful, but many times more complex — it requires error handling, observability, and thorough piloting. Not for a first project.
For 80 % of companies in the first 90 days, the right answer is: a copilot over internal documentation, or RAG over a company knowledge base.
Data: open the Pandora's box right away
AI projects fail on data, not on models. In the first month, find out:
- Where are the documents, manuals, technical specifications, and service records — and in what format (PDF, Word, spreadsheets, databases)?
- Who has access? Is data sensitivity defined (GDPR, trade secrets)?
- How fresh is it — are documents current or from 2018?
The Cisco AI Readiness Index reports that only ~34 % of companies rate their data readiness as sufficient. In practice this means: scanned archival PDFs, manually coded Excel spreadsheets, and documentation that exists only in the heads of long-tenured employees. That is not a reason to hold back — it is a reason to start with a data audit, not a model.
Days 31–60: A pilot that actually decides something
Choosing the first use case: ROI-positive, not the most exciting
The most common mistake: a company picks the most ambitious use case — "we want AI for predicting production-line failures" — and the pilot stalls on insufficient historical data, an unclear success criterion, and no domain expert on the team.
The first use case must satisfy three conditions simultaneously:
- 1.Measurable baseline — you know how many hours the task currently takes. Without a baseline you cannot evaluate the result.
- 2.Bounded scope — the pilot fits into 4–6 weeks including evaluation. The longer the pilot, the lower the probability it reaches a conclusion.
- 3.Real data foundation — you have at least 50–200 documents or transactions on which to test the pilot. Not a database from the future.
A good first use case for a manufacturing company: Q&A over technical documentation (RAG over service manuals, certificates, and standards). Technicians search for answers to the same questions ten times a day, there are plenty of documents, and measurability is clear (time-to-answer before vs. after).
Pilot architecture: local or cloud?
For companies with sensitive technical documents or regulatory burdens (mechanical engineering, chemical industry, healthcare) the first question is: "Can our data leave the network?"
If not — a local model (Ollama with Qwen 3 or Mistral, running on a company server or workstation) with a local vector database is the right choice. Performance will be lower than a frontier cloud model, but compliance risk is zero.
If yes — a cloud API (Claude Sonnet, GPT, Gemini Flash) via Azure OpenAI or directly through the API delivers better out-of-the-box performance with lower IT overhead in the pilot phase.
A more detailed comparison is in the article Local LLMs vs cloud and also in the discussion RAG vs fine-tuning — when to use which.
Success criteria before the pilot begins
Define what success looks like before the pilot. Not after. Typical metrics:
- Time saved per task — target: −30 % or a specific number of minutes
- Answer accuracy — target: >85 % correct answers when evaluated by a domain expert (20–50 test questions)
- User adoption — target: >60 % of team members use the system at least 3× per week after 4 weeks
- Error rate — target: <5 % of responses containing factually incorrect information
Without these numbers, the pilot ends in a debate about "does it work?" instead of data that answers yes or no.
Days 61–90: Results, decision, plan
Evaluating the pilot: three questions
After 4–6 weeks of piloting, answer three questions:
1. Does it work well enough? Compare against the pre-pilot baseline. If you hit the target — continue. If not — why not? Data, prompt, integration, or was the use case a poor choice?
2. Are people adopting it? A technically functional system that nobody uses has no business value. Adoption in the first weeks predicts long-term outcomes. If adoption is low, find out why — UI, trust, training, or does the system simply not save time?
3. Where is the greatest resistance? Every AI project hits a "this won't work" wall. Identify where the resistance comes from — IT security (data), middle management (control), frontline workers (fear of job loss). Each has a different answer.
ROI calculation: simple, but honest
For decision-makers who need to justify continuing the project, you need numbers. Measuring the ROI of AI projects is a topic in its own right, but the basic framework is:
- Time saved: (hours saved per month) × (worker hourly rate) × 12 = annual value
- Project cost: infrastructure + API costs + engineering time (pilot + production)
- Payback period: cost / monthly value = months to break even
For a typical RAG copilot serving 5–10 technical workers saving 30–60 minutes per day: the annual value is measurable in tens of thousands of euros at Central European rates. Pilot costs are roughly 10–30 times lower if you have existing infrastructure and a few days of engineering time.
Build vs. buy: the day-90 decision
After the pilot you have enough data for the build vs. buy decision. A practical rule:
Buy (SaaS tool, integrated plugin) when: the use case is generic (customer support, email summarisation), competitive differentiation is not critical, and the team has no AI engineering capacity.
Build (custom RAG pipeline, custom agent) when: data is sensitive and cannot go to the cloud, the use case is domain-specific (engineering standards, internal processes), or you plan to scale to more use cases where shared components (embedding, vector DB, orchestration) make sharing worthwhile.
Combine — the most common correct answer in 2026: buy for commodity layers (model API, vector DB), build for the "last mile" (prompts, retrieval logic, integrations, eval harness).
What to avoid: five traps of the first 90 days
1. Hype use case without baselines
"We want predictive maintenance" or "we want an AI strategic analyst" — ambitious, visually attractive, and on closer inspection lacking a measurable baseline, historical data, and a domain expert on the team. These projects die in month three.
The rule: if you have no baseline, you have an aspiration, not a project.
2. Vendor lock-in during the pilot phase
Some vendors offer a "free pilot" in exchange for a commitment. Before you sign, test an alternative — an open-source stack (LangChain/LlamaIndex + Qdrant + local model) can deliver most pilots with no strings attached. Vendor lock-in is reasonable after use-case validation, not before.
3. An AI team without a domain expert
An AI engineer can build a pipeline. They cannot assess whether an answer about a hydraulic circuit is correct. Without a technician or process engineer who validates outputs during the pilot, you have no basis for measuring accuracy. A domain expert is not a "nice to have" — they are a blocker.
4. One pilot = production strategy
Pilots prove a principle, not production readiness. Getting from pilot to production requires: a security audit (prompt injection, data egress), MLOps (monitoring, prompt versioning, eval regression), and change management (user training, SLA, an escalation path for when AI answers incorrectly).
5. Ignoring the EU AI Act
If your use case enters a decision chain with an impact on people (recruitment, performance evaluation, risk classification), the EU AI Act requirements on transparency and risk classification apply from 2 August 2026. Compliance is not just a legal exercise — it affects system architecture (audit log, explainability, human oversight). The earlier you account for it in the design, the cheaper the change.
The team: who you need and who you don't
An AI project is not a solo discipline. A production system (not just a pilot) typically requires a combination of:
- AI/ML engineer — RAG pipeline, fine-tuning, agent orchestration, eval
- Data engineer — data pipeline, cleaning, vector DB
- Domain expert — output validation, eval dataset creation
- Product owner — success metrics, use-case prioritisation, stakeholder management
- IT/security — data egress, access control, compliance
A full crew is not required for the pilot phase. An AI engineer + domain expert + product owner can handle a pilot. MLOps and the data engineer join for production.
Warning: a team of fewer than three people responsible for an AI project is risky even for a pilot. One key person dropping out stops everything.
Whether to build an internal team or work with an external partner depends on strategy: if AI is a long-term core competency for the company, build internally. If you need to validate 2–3 use cases quickly and have no AI capacity in-house, an external partner with a fixed project framework is faster. A combination — external partner pilots, internal team takes over production — is a sensible hybrid model.
Measuring at 90 days: four retrospective questions
At the end of the first 90 days, answer honestly:
- 1.Did the pilot meet the success criteria we defined before it started? (If you didn't define them — that is itself a lesson.)
- 2.Can we name 2–3 additional use cases where the same infrastructure delivers further value? (If not — there is a risk that we are piloting without a strategy.)
- 3.Does the project have an executive sponsor who can articulate the business value? (Projects without a CEO/COO sponsor fail statistically far more often than those that have one.)
- 4.Do we know what the system will look like in production — monitoring, eval, update cycle? (If not — the pilot is a proof of concept, not a foundation for scale.)
These four questions do not evaluate the technology. They evaluate organisational readiness — and that determines the outcome more than any model does.
Frequently asked questions
What does it all cost — the first 90 days?
It depends on scope, but as a rough guide: a pilot with a RAG copilot over internal documentation, running on a cloud API, with an external partner — engineering time in the low thousands of euros, plus low monthly API costs (for an SME typically tens to hundreds of euros at normal volumes). A local deployment (own server, open-weight model) has a higher one-off entry cost but lower running costs.
Do we need our own GPU?
For the pilot phase, generally not. A cloud API (Claude, Gemini Flash, GPT) or Ollama on an existing company server with a standard CPU is sufficient for a pilot. Owning a GPU makes sense when: data cannot go to the cloud, inference volume is high (thousands of requests per day), or you plan fine-tuning.
What if our data isn't ready?
Data unreadiness is not a blocker for a pilot — it is an input parameter. A pilot on 50 well-prepared documents is better than a pilot on 5,000 poor-quality ones. Start with a gold set of 50–100 documents that you review manually. Production scaling is addressed after use-case validation.
How do we explain to employees that we are introducing AI?
Directly and concretely — not "AI will take your job," but "AI will take over searching through manuals so you have more time to solve actual problems." Involve workers in the pilot as "AI evaluators" — their role is to test and report errors. Ownership of the pilot reduces resistance to the production rollout.
Is it better to start with one large project or several small pilots?
From experience: one well-chosen pilot with a clear ROI is better than three parallel experiments. Parallel pilots spread attention thin, reduce data quality for each one, and make evaluation harder. Scale to more use cases after the first success — not before.
*MP Industrial Solutions helps companies navigate the first 90 days in a structured way — from use-case inventory through architecture selection to defining measurable success criteria. If you are considering where to start and want to avoid pilots that lead nowhere, we would be happy to schedule a free initial consultation.*
