How to Scale Your AI Strategy Without Breaking the Bank

17 June, 2026|6 min

Matt Müller

VP, Capability Lead for Technology

Five takeaways from our webinar with Alibaba Cloud on the world's most downloaded open-source LLM:

A lot of organizations are hitting the same wall. The boards and CFOs who enthusiastically greenlit AI a year ago are now asking a harder question: where is the return? Costs keep climbing while business outcomes stay flat. Uber has talked publicly about burning through its annual AI budget in the first four months of the year. Microsoft has signalled it will pull back on external AI spend in favour of in-house tools. The comfortable idea that "AI will pay for itself" is getting hard to defend on an earnings call.

The honest diagnosis is that this isn't an AI problem. It's a procurement problem. Most enterprises defaulted to proprietary North American models with no real plan for what happens to cost at scale. Per-token pricing is forgiving for a four-person pilot. It becomes a runaway line item when thousands of people lean on it every day and you own none of the underlying asset.

Smart companies aren't writing off AI. They're changing how they buy and run it. Here are five shifts that came out of our session with Alibaba Cloud, for the people who defend the budget and the people who ship the product.

1. Treat AI procurement the way you treated the cloud

The "growth at all costs" mindset made sense when AI was an experiment. It doesn't survive enterprise scale.

Proprietary models are fast to start with, and that's their strength. But they're a black box: you don't own the engine, you can't see under the hood, and the price moves on someone else's schedule. Open-source models flip the curve. Setup costs more upfront, because you handle the deployment, but the cost trajectory bends down over time instead of compounding, and the asset is yours.

It's the same build-versus-buy decision IT leaders have made for thirty years, now applied to models. Not ideological, not cheap-over-capable. Just financial math, and at scale the math increasingly favours owning more of your stack.

2. Stop routing every prompt through your most expensive model

The build-versus-buy framing hides the real question. The answer is almost always both. What matters is matching each workload to the right way to run it.

Picture a ladder with three rungs:

Rung 1, most of your volume. Run high-volume, predictable work on an efficient open-weight Qwen model you host yourself. Models like Qwen3.6-27B and the 35B-A3B are built for exactly this.
Rung 2, when you need more. When a task is demanding but the data must stay inside your walls, step up to the largest open-weight model, self-hosted. A strong fit for regulated or air-gapped environments.
Rung 3, only the hard edge. For the genuinely hard, lower-volume tasks, call the closed frontier model through an API. Qwen 3.7 Plus and Max live here, and they're worth reaching for precisely because you aren't sending everything to them.

The discipline is simple to say and rare in practice: escalate only when the workload demands it. Most teams picked one model and route everything through it. A laddered approach can cut inference costs dramatically without touching quality on the work users actually notice.

3. Open-source performance has closed the gap that mattered

The geopolitical noise around AI has led many enterprises to quietly discount what's coming out of Asian labs. That's a mistake.

A year and a half ago DeepSeek R1 hit near-GPT-4 benchmarks at a fraction of the cost, and that was the wake-up call. Today open-weight models rival frontier models on the work enterprises actually run, and hold their own against proprietary US models. Alibaba's Qwen is the most downloaded open-source LLM in the world, and Alibaba's track record running cloud at scale gives it a credibility pure-research labs can't match.

One distinction worth keeping straight: the open-weight Qwen models carry the bulk of real workloads, while the frontier-rivalling 3.7 Plus and Max models are closed and API-only, the escalation tier rather than the everyday workhorse. Same family, same strategy, but not the same thing.

4. You can test all of this with almost no engineering risk

A common reason teams stay locked into a vendor is the perceived overhead of switching. That friction used to be real. It mostly isn't anymore.

Qwen's APIs through Alibaba Model Studio are compatible with both OpenAI and Anthropic structures. A developer can point an existing application at a new model by changing the API key and endpoint, leaving the rest of the code untouched. We showed it live in the webinar: the same travel-booking agent ran against one provider, then against Qwen 3.7 Max, then against the open-weight 35B-A3B, with only the configuration line changing.

That's what makes the ladder safe to adopt. Run your existing evaluations against a candidate model, compare it honestly to your current spend, and keep it only if it clears the bar. If it doesn't, you've lost an afternoon. Staying locked in on inertia alone gets hard to justify.

5. Get the data question right from the start

When the partner is international, data privacy is the elephant in the room, and the answer isn't to wave it away. It's to look at the architecture.

The commitments are solid. For B2B workloads on Alibaba Cloud, your data is never used to train models, and your stored data stays in the region you select, though depending on the deployment you choose, inference itself may run across global regions. Governance includes safety screening, human review, and PII filtering, and the platform is GDPR and SOC 2 compliant, with zero data retention available.

The cleanest answer of all is to self-host. When an open-weight Qwen model runs inside your own cloud or on-premise, the question of where your data goes has the simplest possible answer: nowhere. Run it air-gapped for maximum control, or inside your existing AWS, Azure, or GCP environment to balance control with scale. For the managed API, the practical step is to choose the regional deployment that meets your requirements up front and confirm it covers the model you want. Match each workload to the option its data actually needs, and the security question answers itself.

The bottom line

The era of funding variable AI costs without a strategy is over. The organizations that come out of this in good shape won't be the ones that spent the most. They'll be the ones that matched each workload to the right model, built a defensible procurement process, and stopped treating open source as a compromise.

Appnovation is deploying Qwen for enterprise clients today. If you're looking at your AI spend and want to understand what a more sustainable architecture looks like, talk to our team.

Interested in learning more watch our recent webinar with Alibaba Cloud here.