In February 2026, Google Cloud published a 60-page report that turns the return on investment of AI-assisted software development into a financial model. In it, Google’s DORA team and the consulting unit “delta” use a sample organization to calculate the first-year costs and benefits of introducing agentic tools.
DORA, now part of Google, has been one of the most influential research programs for measuring software delivery since 2014. Its studies helped establish the four key delivery metrics—lead time, deployment frequency, change failure rate, and recovery time—that have since become industry standards. The report “The ROI of AI-assisted Software Development” was produced with the internal consulting unit “delta,” which supports Google Cloud customers in adopting AI.
The report is divided into seven chapters. It opens with an executive summary, develops a business case, describes the split in the market’s assessment of AI, walks readers through ROI calculation using value drivers, cost drivers, and sample scenarios, identifies five structural prerequisites for successful adoption, outlines an investment roadmap, and closes with guidance on securing long-term returns. The appendix includes a sample ROI calculator; an interactive version is available at dora.dev/ai/roi/calculator.
AI as an Amplifier, Code as a Burden
The report returns again and again to the idea that AI acts as an amplifier. It follows from the central observation of the 2025 DORA Report: Those with a well-functioning engineering system multiply its strengths through AI. Those with a dysfunctional system multiply the dysfunctions. “Without this foundation, AI creates localized pockets of productivity that are often lost in downstream chaos,” the authors write. The result is a remarkable reinterpretation of what counts as code output.
Referring to the seminal work “Software Engineering at Google,” the report cites the maxim that code is often “a liability, not an asset” over the lifetime of a system. Operating a system consumes the costs of its creation by orders of magnitude. More code, generated without oversight, increases the verification effort and leads to long-term technical debt. That reversal sets the tone. It shifts the usual AI promise from “more code, faster” to “fewer bottlenecks, consistently.”
A Familiar Thesis, Now Quantified
Anyone reading this report alongside the recent post on the Team Operating System for Agentic CLI will recognize the connection. The argument there was that tools without a method push errors into the output instead of making them visible. In a Stanford study, Christopher Potts and Moritz Sudhof estimated that 88 percent of errors became invisible in dialogues with untrained users. The DORA report adds the organizational and economic layer. These invisible errors are not merely a quality issue. They show up in delivery metrics as a higher change failure rate and longer recovery time, with measurable effects on revenue.
The report’s five systemic keys—trust, an internal platform, a data ecosystem, user orientation, and automated guardrails—largely align with the pillars of methodical training. A written AI policy corresponds to a repository constitution. An internal developer platform managed as a product corresponds to tool zones, where tools are classified by risk. User orientation corresponds to the discipline of aligning every release with a clear goal. Method is not the goal. ROI is. But without method, ROI remains scattered. What once looked like a question of style and review culture appears in the DORA report as a variable in the financial model.
What This Means for German Teams
The report’s sample figures come from a U.S. cloud-native scenario. The annual compensation of $176,000 used in the calculator is at the upper end of what a German software specialist costs, especially in regulated industries covered by collective bargaining agreements. The gap between greenfield and brownfield effects—which the report puts at 35 to 40 percent versus 10 percent—also weighs more heavily in German enterprise environments. More legacy knowledge is embedded in these systems than the Mountain View reference case suggests. A result from the sample calculator will not transfer one-to-one to Stuttgart or Walldorf.
The structural point still holds, and it can be summarized in one number: eight months. Under the model’s assumptions, that is the length of the phase in which more money flows into the system than comes back out. Organizations that interpret the J-curve as failure during this period and cut funding, in the authors’ view, give up the subsequent upswing. Those that invest in the foundation—platform, data, verification, trust, and user focus—are better positioned to capture it. The report offers little comfort to anyone simply waiting for the next model. The rest of this article reconstructs the calculation in detail, puts the contradictory market data in context, and names the reservations the report itself raises.
The J-Curve and Its Three Forces
At the heart of the financial model is a curve shaped like the letter J. The report calls it the “J-Curve of AI value realization” and describes it as an empirically observed trajectory for large-scale transformation projects. In the early phase of implementation, productivity declines, sometimes for months. Only then does it rise again, eventually growing exponentially. Three forces initially pull the curve downward.
The first force is the learning curve—that is, the time teams need to master new interfaces, new workflows, and new prompting strategies. The second force is the “verification tax”—the time developers spend checking AI output for hallucinations and verifying it against security and architectural standards. The third force is pipeline adaptation, because code generated more quickly runs into testing and approval bottlenecks that were previously sufficient. The report does not view these three forces as a failure of the technology, but rather as a learning curve inherent to the transformation. Anyone who draws up a budget without factoring in the J-curve risks cutting funds at the trough and thereby squandering the potential for later growth.
What the Implementation Costs
The sample calculator puts explicit figures on this investment block. With 500 technical employees, fully loaded annual compensation of $176,000, an assumed J-curve duration of three months, and a 15 percent drop in productivity, the learning curve alone costs $3.3 million. Direct tool and training costs come on top. The sample model estimates $250 in licensing fees per user account per year, plus $80 in additional usage costs, for example tokens, $9,600 in training per person, and $100,000 in additional infrastructure costs. That adds up to $5.065 million. Together with the J-curve, the total first-year investment reaches $8.365 million.
What the Implementation Yields
On the value side, the model uses three items. The first is freed-up personnel capacity, which the report calls “Headcount Reinvestment Capacity.” It arises when developers gain time through AI and reinvest that time in higher-value work. With a 12.5 percent net time saving per person, this amounts to $11 million in the sample scenario. The second item is additional revenue from more delivered features. If a team delivers 50 features per year and in future delivers 56—one-third of which generate revenue and have a 0.5 percent impact on product portfolio revenue of $100 million—this adds nearly $990,000. The third item points in the opposite direction. The 2025 DORA Report found that AI adoption correlates with a higher change failure rate. In the model, failed releases rise from 5 to 6 percent, which, with a four-hour recovery time and downtime costs of $100,000 per hour, produces a loss of $344,000. Taken together, the first-year value comes to $11.646 million.
Behind the model’s three-part structure lies a broader value framework, which the report describes through five pillars: cost efficiency, productivity, developer experience, user experience, and business growth. The further one moves to the right in the framework, the weaker the direct link to AI use in engineering becomes, and the stronger the financial leverage. In Figure 4, the report visualizes what the 2025 DORA Report measured empirically. AI adoption has the largest impact on individual effectiveness, followed by higher delivery instability with an inverse sign, then organizational performance, meaningful work, code quality, product performance, delivery throughput, and team performance. Burnout and friction barely move. The effects vary widely; the impact is driven by the system, not by the tool alone.
The report devotes a separate section to developer experience. It does not exclude this factor from the model, but leaves it out of the sample calculation out of caution. The correlation between AI adoption and staff retention is real, but statistically too variable to capture in a single point estimate. As a qualitative argument, it remains strong: replacing a software specialist typically costs one and a half to two times the annual salary. Organizations that improve the working environment by delegating routine work to agents can reduce these switching costs. In the sample calculation, the item is deliberately omitted, but the report still describes it as a “powerful qualitative lever.”
The two sides of the equation produce a first-year ROI of 39 percent and a payback period of 0.7 years, or about eight months. The report puts these numbers into context. A payback period of six to nine months is treated as a benchmark for agile teams, while twelve to eighteen months is typical for large, regulation-driven organizations. For a longer-term perspective, the authors point to their own data. Google Cloud customers reportedly achieved an average return of 727 percent on their AI investment over three years. The figure comes from a Google Cloud publication and is labeled in the report as a customer report, not as an independently validated result.
Methodological Notes in the Text
The report itself points out the limitations of the calculation in several places. The methodology box states that the calculations are “a highly uncertain estimate, intended to spark a conversation, not as a rigid mathematical formula.” The authors cite the statistician’s maxim that all models are wrong, but some are useful. Anyone who changes individual assumptions in the calculator will see the final result shift immediately. A conservative variant lowers the value to 80 percent and increases the costs to 150 percent, while an optimistic one goes in the opposite direction. The range of variation is wide. The authors explicitly recommend running multiple scenarios simultaneously rather than relying on a single point estimate.
Market Fragmentation
A lengthy section of the report asks why financial results vary so widely despite near-universal adoption. It describes three camps. The optimistic view cites a Google Cloud survey in which 78 percent of surveyed executives reported a return on at least one generative AI use case, along with 88 percent positive feedback from early adopters of agent-based systems. The neutral view points to the Stanford AI Index 2025, which describes expectations for workforce productivity as “consistently mixed.” Adoption is high, but structural transformation remains rare in most industries, and productivity gains are mostly marginal. The pessimistic view cites research from the MIT-NANDA project. According to those findings, internal corporate implementations often fail, pushing employees into a “shadow AI economy” of unauthorized consumer applications. The main obstacle, MIT-NANDA argues, is neither budget nor technology, but organizational design.
A notable figure is somewhat buried in the chapter on ROI modeling. An analysis from Stanford’s software engineering productivity research showed productivity gains of 35 to 40 percent for simple greenfield tasks. In complex, aging legacy code, by contrast, the effect is often ten percent or less. The report’s authors use this range to argue for careful use-case selection. Organizations that deploy AI primarily in greenfield scenarios see results more quickly. Those that apply it to brownfield migrations need more extensive preparation of the engineering system.
Five Structural Prerequisites
In the chapter “Build the organizational foundation for AI adoption,” the report identifies five systemic keys designed to guide a company beyond scattered local productivity gains. The first prerequisite is trust, technically implemented as a “clear and communicated AI stance.” This refers to a written organizational policy on AI that defines expectations, boundaries, and review obligations. The second prerequisite is an internal developer platform (IDP), which is treated as a product and minimizes friction in the use of tools, pipelines, and architectural patterns. In the agentic era, the IDP functions as a “risk mitigator and context provider for agents.” The third prerequisite is an AI-accessible data ecosystem, because agents are only as good as the data they access. The fourth prerequisite is an uncompromising user focus that directs the speed gained through tools toward real-world problems, rather than counting pull requests. The fifth prerequisite is automated verification guardrails that act as brakes, allowing the engineering system to drive faster safely.
A succinct statement in the report sums up the thesis: “We measure AI not by the code it writes, but by the bottlenecks it clears.” The statement appears in the chapter on the business case. It shifts the usual promise—that AI will replace developer jobs—into a different logic. ROI is not a measure of how many jobs can be cut. ROI is a measure of how much latent human creative potential is unleashed by outsourcing routine systemic work to autonomous agents.
Recommendation Against Job Cuts
The report consistently argues against a headcount-reduction strategy. Organizations that turn productivity gains into layoffs damage morale, reduce the willingness to learn, and create incentives to resist process improvements. Instead, the authors recommend framing freed-up capacity as reinvestment in innovation. The savings are avoided costs from not having to hire additional staff, not cash released through layoffs. This interpretation is built into the model because replacement costs in software roles typically amount to one and a half to two times annual salary.
The term “verification tax” runs like a thread through the text. In the chapter on additional and indirect costs, the report describes how a low level of trust deepens the J-curve. If every block of code were reviewed twice before entering the pipeline, the productivity gains would evaporate. Trust, therefore, is not a soft category. It is a hard financial variable. Generating it requires a system that rewards verification rather than raw code volume.
Experiment Frequency as a Financial Metric
A separate section argues that experiment frequency is a financial indicator, not merely an engineering metric. The reasoning draws on the financial concept of optionality. An option is a low-risk investment that grants the right to make a larger investment later, without creating an obligation to do so. A prototype or an A/B test works in the same way. AI lowers the upfront cost of such options by drastically shortening the time required for code development. Organizations with more options do not have to commit to a single hypothesis too early.
This financial interpretation adds a dimension that pure speed metrics miss. The question is not whether a team delivers more code in the same amount of time. It is how many hypotheses the team can test with real users before investing resources in scaling. High experiment frequency, the report states, is an early indicator that the organization has absorbed AI and is less likely to invest in the wrong features.
Three Reservations
Three reservations deserve a separate note. First, the report comes from Google. It contains references to Google Cloud tools and consulting services, and its sample calculator ends with a contact option for the “delta” practice. This does not invalidate the methodology, but the marketing frame is visible.
Second, the calculator is limited to the first year. The high multi-year return of 727 percent comes from a separate source, which the report itself identifies as a customer report. Third, the key input variables—the share of time saved, the success rate of additional features, and the revenue impact per feature—remain estimates, and the report openly acknowledges their range. The authors do not hide this uncertainty. They repeatedly emphasize that the calculator is a conversation starter, not a financial instrument.
Maturity as a Financial Variable
Despite these caveats, the report offers something that has been missing from the debate over AI tools. It translates the claim that the foundation matters more than the model into a monetary figure. As long as the discussion about agent-based tools focused on model comparisons or the speed of individual tasks, organizational maturity sounded like a matter of style and review culture. In the DORA report, it becomes a variable in the financial model. A conservative assumption about the maturity of the internal platform lowers the total value from $11.6 million to around $9 million. An optimistic assumption raises it by a similar amount. Maturity is not sentiment here. It is a factor.
Another observation, treated almost casually in the report, deserves attention. Inference costs for the most advanced models are said to have fallen by a factor of 280 between November 2022 and October 2024. The cost of model queries moved toward zero. The real costs shifted to governance: verification, workflows, and qualification. Organizations prepared in these three areas—with a platform team, a review culture, and honest planning for learning costs—are the ones likely to see the increase in value in years two and three that the authors discuss. Those that are not prepared will stay stuck near the bottom of the curve.
This shift explains another finding in the report. The authors mention which model a team uses, but consistently treat it as secondary. The central variable is not the model. It is the ability to embed the model in a workflow that rewards verification and limits risk. What was still treated as a technical question in 2024 has become an organizational question in 2026. The report tries to capture precisely that shift in a single figure.
An interim assessment from the report itself summarizes the logic soberly: “The path to ROI is a sequence of building competencies, not a race for the latest model or the latest tool.” The text ends with this statement. After 60 pages of methodology, it almost slips by. In substance, it marks a turn. The race for the fastest model generation is losing the central role it was given in 2023 and 2024. It is being replaced by a competition for the most mature engineering organization. Which discipline a company masters better is decided not by the license agreement, but by the internal platform, the review routines, and the willingness to pay the price of learning for three months without cutting funding prematurely.