Most descriptions of AI implementation fall into one of two categories: the vendor pitch (six seamless steps to transformation) or the academic model (a flowchart with no time scale). Neither maps to reality. This article describes what a production AI implementation actually looks like in a mid-market regulated company: week by week, gate by gate, with the uncomfortable decision points that most methodologies gloss over.

The structure here is drawn from three years of production deployments across financial services, insurance, and government, combined with implementation frameworks from IBM Consulting, MIT's Project NANDA, and peer-reviewed stage-gate research published in December 2025.[1][2][3] The timelines are realistic. The decision points are real. And the most important feature of the process is the one most organisations skip: the willingness to stop.

DISCIPLINE

The week-by-week breakdown in this article isn't complicated because the technology is simple. It's complicated because the implementation process requires unwavering discipline: the discipline to qualify before building, to measure before scaling, and to stop when the evidence says stop.

I

Before Week One: The Question Nobody Asks

Every AI engagement should begin with a question that most vendors, consultancies, and internal champions have a financial incentive to avoid. Should you be doing this at all?

RAND Corporation's research identifies the primary cause of AI project failure as a fundamental misunderstanding of the business problem.[4] This isn't a technical assessment. It's a diagnostic: does your organisation have a specific, measurable business problem that AI is genuinely the right tool to solve?

Should you be doing this at all?

The first decision point is binary: GO, REDIRECT, or STOP. A proper qualification diagnostic assesses four dimensions.

Business problem clarity: can you articulate what you're solving and how you'll measure success? Data readiness: do you have the data to support an AI solution? Regulatory landscape: what compliance obligations apply? Organisational commitment: does leadership understand what implementation requires?

If any dimension fails, the honest answer is to redirect or stop, not proceed and hope.

This is where most engagements go wrong. The pressure to move forward (from boards that have approved budgets, from technology teams that want to build, from vendors that want to sell) overwhelms the discipline to ask whether the foundation is sound. IBM Consulting's stage-gate framework makes this explicit. Each gate asks "should we spend more money?" not "is this good enough?"[1]

II

Weeks 1–2: The Diagnostic

Assuming the qualification gate is passed, the first two weeks are diagnostic. This isn't a strategy engagement. It's a focused assessment that produces three deliverables: a validated problem statement with agreed KPIs, a data readiness report, and a regulatory requirements map.

The critical discipline here is specificity. RSM's middle-market research found that 62% of executives said generative AI was harder to implement than expected.[5] The primary reason? Expectations were set at the wrong level of abstraction. "Use AI to improve customer service" isn't a problem statement. "Reduce average claims processing time from 14 days to 3 days by automating initial document review, while maintaining a 99.2% accuracy rate and full audit compliance" is.

The KPIs must be agreed before building starts. Don't retrofit them afterwards. This is a lesson from every failed pilot: if you don't define success before you build, you'll redefine it afterwards to match whatever you built.

III

Weeks 3–8: Phase One, Prove

Phase One has a single objective: demonstrate measurable value with one use case, using real users, real data, and real workflows.

This is the point where the "controlled environment" problem that DAS Advanced Systems has documented becomes critical. The difference between a pilot that succeeds and one that fails? Whether it runs against production conditions. A system that works with clean sample data in a demo environment proves nothing about whether it will work with the messy, incomplete, legacy data that your organisation actually has.[6]

67%
success rate when buying AI from specialised vendors
vs. 22% for internal builds
MIT Project NANDA, August 2025

A properly run Phase One in a mid-market regulated company typically takes three to six weeks and includes five parallel workstreams:

1

Solution design and build

The technical work: model selection, data pipeline construction, integration with existing systems. For most mid-market deployments, this involves configuring and customising existing AI capabilities, not building from scratch. The build-versus-buy decision should already be resolved at this point.

2

Data preparation

Cleaning, formatting, and structuring the actual production data that the system will use. For RAG-based systems, this means building the vector store from real documents. For classification systems, this means labelling from real examples. The 41% of mid-market executives who cite data quality as their top challenge typically discover the full extent of that challenge right here.[5]

3

Governance embedding

Audit trails, decision logging, bias monitoring, and compliance documentation are built into the system architecture from the start, not added afterwards. For organisations subject to the EU AI Act, this is where conformity assessment preparation begins.

4

User involvement

The people who will actually use the system are involved from week three: testing, providing feedback, and identifying gaps between what the system does and what the workflow actually requires. This addresses the Prosci finding that 38% of AI failures stem from user proficiency issues.[7]

5

Measurement infrastructure

The KPIs agreed in the diagnostic phase need measurement mechanisms. If you can't measure the baseline and track improvement, you can't make an evidence-based decision at the next gate.

IV

Weeks 8–10: Gate One, Decide

Gate One is an investment decision, not a progress review. The question is straightforward. Based on measured results against agreed KPIs, should your organisation invest more money in this initiative?

Three outcomes are possible. SCALE means the MVP demonstrated sufficient value to justify production deployment and expansion. PIVOT means the results were mixed. The problem is worth solving, but the approach needs adjustment. STOP means the initiative didn't produce sufficient evidence of value to justify further investment.

These companies do not have the luxury of prolonged pilot programmes or unclear returns.
CBS News, January 2026, on mid-market AI implementation[8]

The willingness to stop is the single most important differentiator between organisations that waste money on AI and those that invest it productively. Every methodology claims to include decision gates. In practice, the economic incentives of most vendor relationships push towards continuation regardless of results. The gate only works if the decision-maker has genuinely independent data and the discipline to act on it.

V

Weeks 10–40: Phase Two, Embed

If Gate One produces a SCALE decision, Phase Two runs two parallel tracks that are equally important. The second one is what most implementations neglect.

Track A: Scale the solution

Moving from MVP to production deployment involves expanding to additional users and workflows, hardening the infrastructure for reliability and scale, implementing continuous governance monitoring aligned to EU AI Act and ISO 42001 requirements, and integrating deeply with existing operational systems.

The timeline for Track A varies significantly by complexity. For a mid-market company deploying a single AI system (say, an automated document review tool for an insurance underwriter), production deployment might take eight to twelve weeks. For more complex initiatives involving multiple interconnected systems, it can extend to six months.

Track B: Build internal capability

This is the track that separates implementations that last from those that become expensive shelf-ware. Prosci's research is unequivocal: organisations investing in developing their own people see consistently better results than those depending heavily on outside consultants.[7]

Track B includes structured, role-based training (not a single workshop, but an ongoing programme tailored to different user groups), fostering an experimentation culture (which Prosci identifies as the single most significant factor in AI adoption success), democratising AI expertise across the organisation rather than concentrating it in IT, and providing frameworks for identifying future AI use cases independently.

Dimension Typical Consultant Production-Focused Approach
Starting point "Let's do AI" "Should you even do AI?"
Gate decisions Push forward regardless Willing to stop and lose revenue
Success metric Project delivered Client outcome achieved
Training Add-on at the end Primary deliverable in Phase 2
Completion criteria System deployed Client is self-sufficient
Recurring revenue Perpetual dependency Earned through results
Implementation Timeline: 40 Weeks
DiagnosticWeeks 1–2
Phase One: ProveWeeks 3–8
Gate One: DecideWeeks 8–10
Phase Two: EmbedWeeks 10–40
Gate Two: GraduateWeek 40+
VI

Week 40+: Gate Two, Graduate

The final gate asks the most important question of all: can your team operate independently?

Graduation criteria are specific and measurable. Your client team can operate the AI system without external support. They can identify when the system needs updating or retraining. Governance processes are in place for ongoing compliance. And, critically, the team has a framework for evaluating future AI opportunities without needing to hire external help to do it.

If these criteria are met, the engagement ends. Not because the partner has nothing more to sell, but because the purpose of the engagement was to build capability, not dependency. If the criteria aren't met, the engagement extends. But only the specific gaps are addressed, not a wholesale continuation of the full programme.

VII

What This Actually Costs

There's a persistent opacity around AI implementation costs that serves vendors far better than it serves clients. The reality for mid-market companies is that the total cost depends heavily on which approach they take.

DAS Advanced Systems has documented the typical Big 4 engagement cost: $500,000 to $1 million for strategy, $1 million to $2 million for a pilot, and $3 million to $10 million for implementation. For a mid-market company with $300 million in revenue, that represents 1.7% of total revenue. It's an enormous commitment with no guaranteed outcome.[6]

Boutique firms specialising in regulated industries typically operate at 40–60% lower cost, with faster delivery timelines (weeks, not months) and more hands-on implementation support. But cost is only one dimension. The more important question is: what do you own at the end?

When the engagement ends, do you own the IP?

Can you operate independently? Can you modify, extend, and maintain the system without returning to the same vendor? If the answer to any of these is no, you haven't bought a solution. You've rented one. And the total cost will be far higher than the initial engagement.

67%
Success rate from specialised vendors
MIT Project NANDA
22%
Success rate for internal builds
MIT Project NANDA
38%
AI failures from user proficiency issues
Prosci Research
41%
Cite data quality as top implementation challenge
RSM Middle-Market Research
62%
Found AI implementation harder than expected
RSM Middle-Market Research
5%
Of companies generating real value from AI
BCG Estimates
5%
of companies are generating real value from AI. Those companies achieve five times the revenue increases and three times the cost reductions of their peers.
BCG, The Widening Gap, September 2025
BEGIN

The week-by-week breakdown above isn't complicated. The technology isn't the hard part. What makes production AI implementation difficult is the discipline it requires: the discipline to qualify before building, to measure before scaling, to invest in people alongside technology, and to stop when the evidence says stop.

For mid-market companies in regulated industries, this discipline isn't optional. They can't absorb enterprise-scale failures, they can't wait years for returns, and they can't deploy AI systems that their regulators can't audit. The implementation approach has to be right from week zero. Because there may not be a second chance.

Sources

  1. 1IBM Consulting (Gidwani & Bhattarai), "Measuring AI outcomes: 7-step stage-gating framework." ibm.com
  2. 2MIT Media Lab / Fortune, "MIT report: 95% of generative AI pilots at companies failing," August 2025. fortune.com
  3. 3Boston Consulting Group, "Are You Generating Value from AI? The Widening Gap," September 2025. bcg.com
  4. 4RAND Corporation, "The Root Causes of Failure for Artificial Intelligence Projects and How They Can Succeed," RRA2680-1, August 2024. rand.org
  5. 5RSM, "Analysing AI trends in the middle market," 2025. rsmus.com
  6. 6DAS Advanced Systems, "Why Big 4 Consulting Firms Are Failing Mid-Size Companies with AI." dasadvancedsystems.com
  7. 7Prosci, "Why AI Transformation Fails: Research Insights from 1,100+ Change Professionals," September 2025. prosci.com
  8. 8CBS News / JPMorgan, "Why the mid-market will determine AI's economic impact," January 2026. cbsnews.com
  9. 9McKinsey & Company, "5 steps for change management in the gen AI age," August 2025. mckinsey.com