Walmart deployed four specialized AI agents (Sparky, Marty, Associate, Developer) — announced mid-2025. Replit grew revenue from $2.8M to $240M over 2025 on an agentic architecture. Duolingo shipped 148 new courses in a single year — more than in the previous 12 years combined. McDonald's shut down its drive-thru AI system after viral failures in 2024. Air Canada lost a lawsuit over a chatbot error: the tribunal ordered it to pay $812.02 CAD. The difference between the winners and the losers is not budget and not technology. It is methodology and process readiness.
Winners: three cases of successful implementation
Case 1. Walmart — agentic architecture at scale
In mid-2025, Walmart announced the deployment of four specialized AI agents: Sparky serves shoppers, Marty works with partners and suppliers, Associate helps employees, Developer supports developers. Each agent serves strictly its own zone of responsibility — with no overlap and no competing interference.
Walmart's fundamental decision was not to create a universal "AI for everything," but to give each audience its own specialized agent. This means that the data training Marty is supplier data. The data training Associate is internal HR and operational processes. Specialization reduces the risk of hallucinations: the agent works within its own, well-documented domain.
The second element is phased scaling. Walmart did not switch to agents overnight. The company accumulated structured data for years — and only then moved to an agentic architecture.
Takeaway. Walmart did not "automate everything" — it gave each user group its own agent and built the data structure to support it. Here AI is not an add-on but operational infrastructure.
For more on how AI-first differs from AI-enabled, see the article "AI-first vs AI-enabled: why Walmart won".
Case 2. Replit — from a tool to an AI-native product
In September 2025, Replit released Agent 3 — the most autonomous version of its agent, capable of working without user intervention for more than 200 minutes in a row, creating subtasks on its own and spinning up helper agents to handle individual blocks of a task. According to TechCrunch and analysts at Sacra, Replit's revenue over 2025 grew from $2.8M to roughly $240M. In January 2026 the company reached a $9B valuation; the ARR forecast for the end of 2026 is $1B.
The key point: Replit did not integrate AI as an additional feature layered on top of an existing editor. The agent became the central product — the entire user experience is built around it.
This is fundamentally different from the approach where companies "add an AI button" to an existing interface. Replit rebuilt the product logic: the user describes a task, the agent takes it on, decomposes it, and executes. The developer becomes an architect rather than an executor.
Takeaway. AI-native is not "we added ChatGPT to the interface." It is AI as an operational platform around which the product is built from day one. The result is multiple-fold revenue growth, unattainable with an incremental approach.
Case 3. Duolingo — 12x growth in content productivity
On April 30, 2025, Duolingo announced the release of 148 new language courses in a single year. For comparison: the company took about 12 years to create its first 100 courses. This is the largest catalog expansion in the company's history.
An honest caveat: some users and tech outlets, in particular TechCrunch, raised questions about the quality of some of the courses. This confirms the general principle: AI removes the production bottleneck but does not eliminate the need for methodological expertise.
Duolingo's model works as follows: methodologists set the pedagogical standards and control quality, while AI generates content within the established frame. The human role shifts from production to designing standards and controlling the result.
Before AI was implemented, the bottleneck was in production: creating a single course took months of work by linguists and content designers. Now the bottleneck is methodological control. This is a different task, requiring different competencies.
Takeaway. AI did not replace Duolingo's methodologists — it removed the production ceiling. The company grew not because it "automated everything," but because it correctly identified the point of application.
Losers: two cases of systemic mistakes
Case 4. McDonald's — deployment without a quality loop
Around 2021, McDonald's launched a test of a voice-ordering AI system in the drive-thru together with IBM. The test covered more than 100 restaurants. In June–July 2024 the test was discontinued and the systems were dismantled.
The reasons for the shutdown got wide coverage: the system confused accents, added items to orders the customer never named — including the well-known viral episode with bacon in ice cream. Videos of the errors spread across social media, creating a lasting reputational stain.
Important: the test ran for three years. McDonald's did not "launch and immediately shut down" — the company gave the system time and resources. The problem is not the speed of the decision but the nature of the environment: a live drive-thru queue is unstructured voice input under conditions of noise, pressure, and accent variation. Speech-recognition systems of 2021–2024 were not ready for this environment without a hard quality filter at the junction of the agent and order fulfillment.
Takeaway. This is not a failure of AI as a technology. It is the deployment of AI at an unsuitable point of application — without sufficient data readiness and without a verification loop. At a different point — for example, in a structured order back office — the same technology could have produced a different result.
Case 5. Air Canada — legal liability for the actions of an AI agent
On February 14, 2024, the Civil Resolution Tribunal of British Columbia issued a ruling in Moffatt v. Air Canada. A customer received incorrect information from the company's chatbot about the size of a discount on the bereavement fare (a fare available when a relative dies) and bought a ticket at the mistakenly stated price. Air Canada refused to compensate the difference, arguing that "the chatbot is a separate entity" and that the company was not responsible for its words.
The tribunal rejected this argument. The court's position: the chatbot is a tool of the airline, and the company is responsible for the information it provides to customers. Air Canada was ordered to pay $812.02 CAD as compensation to the plaintiff — for damages caused by negligent misrepresentation. This is not an administrative fine — it is compensation to a specific customer in a specific claim.
The case's significance as a precedent goes beyond the sum. The "the bot is not us" argument no longer works. Not in Canada, and — judging by the direction of regulatory practice — not in other jurisdictions either.
What went wrong: the chatbot gave customers information on legally sensitive matters without verifying the currency of the fare-policy data. There was no mechanism to check the answer before delivering it or to escalate complex cases to a human.
Takeaway. An AI agent is, legally, your employee. The agent's mistake is your liability. Zones where the cost of error is high (fares, legal rights, medical questions) require a verification loop or an explicit limitation on the agent's scope of competence.
Where AI delivers, where it doesn't, where it adds work — lessons from the cases
Delivers results
Content scale when a methodological standard is in place. Duolingo — 148 courses in a year versus 100 in 12 years. The condition: methodologists set the standard, AI produces within the standard.
Agentic operations in structured zones. Walmart — four agents with clear domains and accumulated data. The condition: structured data, a limited area of responsibility, phased expansion.
Product innovation under an AI-native architecture. Replit — an AI agent as the core of the product, not an add-on. The condition: an architectural decision from day one, not a retrofit.
Does not deliver results
An unstructured environment without a quality gate. McDonald's — voice input in the drive-thru, accents, noise, queue pressure. The technology was not ready for this environment, and the quality loop was absent.
Legally sensitive dialogues without verification. Air Canada — the chatbot gave answers on fare policy without checking currency and without the ability to escalate. The result was a legal precedent.
Adds work
Successful AI implementation does not reduce the management load to zero — it moves it to other points:
- Managing the reputation of AI errors. McDonald's viral failures required PR work that would not have existed without the AI deployment.
- Legal monitoring. The Air Canada precedent means that companies deploying chatbots in customer communications now carry legal expertise as an operational function.
- Quality control of agent responses. Duolingo shifted the labor from production to methodological management. This is different work, but no less of it.
What separates the winners from the losers: three decisions
Decision 1. The right point of application (process fit)
Walmart chose structured zones: customer service, partner communications, HR support, development. Each zone has typed requests and a finite set of responses. McDonald's chose a point with the opposite characteristics: unstructured voice, variable accents, time pressure.
The question is not "does AI work" — the question is "is this AI suitable for this point." The check is simple: do you have structured data to train on and a limited set of permissible responses in this zone?
Decision 2. Data readiness
Duolingo had a multi-year corpus of pedagogical content and clear methodological standards. The agent was given a frame within which it worked correctly. Air Canada deployed a chatbot in a zone where the data (fare policy) changes regularly and requires up-to-date verification — and there was no update mechanism.
Before launching an agent, you need to answer the question: do we have a current, structured, verified database for this domain?
Decision 3. The verification loop
Replit's Agent 3 works autonomously — but not blindly. The agent builds a plan, decomposes the task, creates subtasks, and checks intermediate results before moving to the next step. Autonomy is not "does everything without checks," it is "builds and maintains a quality loop on its own."
Air Canada let its chatbot answer legally significant questions without verification and without escalation. Neither of these mechanisms was implemented.
The verification loop is not distrust of AI. It is an architectural principle that turns an agent into a reliable tool rather than a source of unmanaged risk.
For how to choose an implementation strategy for your own context, see the article "Which strategy the winners chose".
What to study next
Context: The AI S-curve: why the winners entered first — on where the entry point is and why the window is narrowing.
Strategy: AI-first vs AI-enabled: why Walmart won — the difference between companies that rebuild operations around AI and those that bolt it on top.
Practice: Which strategy the winners chose — three implementation models and the criteria for choosing one for a specific context.
Next step: If you need an operational start — "30 processes to automate": a checklist of concrete points of AI application that deliver results without rebuilding the architecture from scratch. Subscribe to the newsletter for series updates and new cases as they ship.
