Setting Up Azure OpenAI for Enterprise: What Nobody Tells You

When a client asked us to add AI capabilities to their enterprise platform, I assumed the hard part would be the AI. Turns out, the hard part was everything around the AI — Azure networking, content filters, rate limits, and getting approval from the security team.

Here’s everything I wish I’d known before starting.

Why Azure OpenAI Instead of OpenAI Directly

For enterprise clients in Vietnam (and most of Asia-Pacific), the conversation goes like this:

Client: “We want AI features.” Me: “Great, we’ll use the OpenAI API.” Client: “Does our data stay in our Azure tenant?” Me: “…let me look into Azure OpenAI.”

That’s basically it. Enterprise clients need data residency guarantees, VNet integration, and Azure AD authentication. The OpenAI API doesn’t offer any of that. Azure OpenAI does, but it comes with its own set of quirks.

The Setup Nobody Warns You About

Deployment regions matter. Not all models are available in all regions. GPT-4 wasn’t available in Southeast Asia when we started — we had to deploy to Australia East. This adds latency. Check the model availability matrix before you promise anything to a client.

Content filters are on by default. Azure OpenAI has content filtering that’s stricter than the OpenAI API. Our first integration kept getting blocked because the system was filtering legitimate product descriptions that happened to mention certain chemicals (it was a manufacturing client). You can request modifications to the content filter, but the approval process takes days.

Rate limits are per-deployment, not per-account. We hit this on day one. The default rate limit for GPT-4 is surprisingly low. You need to request quota increases in advance, and the process isn’t instant. Plan for this in your project timeline.

Private endpoints are worth the complexity. If your client’s data can’t traverse the public internet (and for enterprise clients, it usually can’t), you need Azure Private Link. The networking setup is non-trivial — VNet peering, DNS zones, NSG rules. Budget a full day for this if you haven’t done it before.

Cost Surprises

We estimated the AI features would cost about $200/month in API calls. The first month’s bill was $1,400. What happened:

No prompt caching in Azure OpenAI (at the time we started). Every request sent the full system prompt. For a 2,000-token system prompt with 500 requests/day, that’s a lot of input tokens you’re paying for on every single call.
Embeddings at scale are expensive. Re-embedding the entire document corpus every time a page changed was costing $300/month alone. We switched to incremental updates — only re-embed documents that actually changed.
Development and testing. Developers were making hundreds of API calls a day during development. We added a spend alert and a dev-only deployment with a lower-cost model.

After optimization, we got it down to about $350/month. Still more than the original estimate, but manageable.

What We Actually Built

The client wanted three AI features on their existing .NET platform:

1. Natural language product search. Instead of searching by part numbers, customers type things like “heat-resistant gasket for high-pressure steam applications.” We embed the query and search against pre-computed product embeddings. This replaced a frustrating faceted search that nobody liked.

2. Automated product descriptions. Technical specs go in, human-readable descriptions come out. The manufacturing team had 15,000 products with only SKU numbers and spec sheets. We generated descriptions for all of them in a batch job, then editors reviewed and published.

3. Document Q&A. A RAG system over the client’s technical documentation. Engineers can ask questions instead of searching through 500-page PDF manuals.

Integration Pattern

We built a thin .NET wrapper around Azure OpenAI that the rest of the application uses:

Retry logic with exponential backoff. Azure OpenAI returns 429s when you hit rate limits. You need to handle this gracefully.
Fallback model. If GPT-4 is slow or rate-limited, fall back to GPT-3.5 Turbo for non-critical features.
Request queuing. For batch operations (like generating product descriptions), we queue requests and process them at a controlled rate.
Logging everything. Every API call logs the prompt, response, token count, latency, and cost estimate. This is essential for debugging and cost management.

Lessons

Start with the infrastructure, not the AI. Get Azure OpenAI deployed, networking configured, and authentication working before you write your first prompt. This takes longer than you think.
Budget 3-5x your initial cost estimate until you’ve optimized. Prompt engineering, caching, and batching all reduce costs but they come after the first version.
Content filters will surprise you. Test with real data early. Your manufacturing client’s legitimate product descriptions might trigger safety filters.
The security review will take longer than the development. Enterprise security teams need to understand data flows, retention policies, and model access controls. Start this conversation early.
GPT-4 isn’t always the answer. For embeddings, use the embedding model. For simple classification, GPT-3.5 Turbo is fine. Reserve GPT-4 for tasks that actually need it.

Azure OpenAI is the right choice for enterprise clients who need data governance. But “right choice” doesn’t mean “easy choice.” Plan accordingly.