Building enterprise chatbots in Canada comes with a unique requirement: seamless bilingual support. This isn't just translation—it's understanding context, maintaining conversation state across language switches, and respecting the nuances of Canadian French.
In this article, I'll walk through the architecture we use for production bilingual chatbots on Azure, including real cost numbers and security patterns.
Architecture Overview
Our reference architecture consists of five core components:
1. Azure Bot Service - Conversation orchestration and channel management
2. Azure OpenAI Service - GPT-4 for reasoning, GPT-3.5 Turbo for routine queries
3. Azure AI Search - Vector search for RAG (Retrieval-Augmented Generation)
4. Azure Cosmos DB - Conversation history and user context
5. Azure Functions - Custom logic and integrations
┌─────────────────────────────────────────────────────────────────────────────┐ │ BILINGUAL CHATBOT ARCHITECTURE │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────┐ ┌──────────────────┐ ┌──────────────────────────┐ │ │ │ Users │────▶│ Azure Bot │────▶│ Azure Functions │ │ │ │ (FR/EN) │◀────│ Service │◀────│ (Orchestration) │ │ │ └──────────┘ └──────────────────┘ └───────────┬──────────────┘ │ │ │ │ │ ┌──────────────────────────────┼──────────┐ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌────────────────────────┐ ┌─────────────────┐ ┌───────┐ │ │ │ Azure OpenAI │ │ Azure AI │ │Cosmos │ │ │ │ ┌─────────┬────────┐ │ │ Search │ │ DB │ │ │ │ │ GPT-4 │GPT-3.5 │ │ │ ┌───────────┐ │ │ │ │ │ │ │(complex)│(simple)│ │ │ │Vector idx │ │ │ Chat │ │ │ │ └─────────┴────────┘ │ │ │ EN | FR │ │ │History│ │ │ └────────────────────────┘ │ └───────────┘ │ └───────┘ │ │ └─────────────────┘ │ │ ▲ │ │ │ │ │ ┌───────┴────────┐ │ │ │ Documents │ │ │ │ (FR/EN PDFs, │ │ │ │ FAQs, etc.) │ │ │ └────────────────┘ │ └─────────────────────────────────────────────────────────────────────────────┘
The Bilingual Challenge
Canadian bilingual requirements go beyond simple translation. Users expect to:
- •Start a conversation in English and switch to French mid-conversation
- •Receive responses in their preferred language automatically
- •See consistent terminology (your product names, service terms) in both languages
- •Experience natural, not "translated-sounding" responses
Language Detection Strategy
We implement a multi-layer language detection approach:
Layer 1: Explicit signals - User language preferences from their profile or session.
Layer 2: Azure AI Language - Automatic language detection on each message.
Layer 3: Conversation context - If the last 3 messages were in French, continue in French.
The system prompt includes explicit instructions:
You are a bilingual customer service assistant for [Company].
Respond in the same language the user writes in.
If the user switches languages, switch with them seamlessly.
Use Canadian French conventions, not European French.
RAG for Bilingual Knowledge
The RAG (Retrieval-Augmented Generation) pattern is essential for grounding responses in accurate company information. For bilingual systems, this requires special consideration.
Document Processing Pipeline
Step 1: Ingest documents in both languages
We maintain parallel document sets—French and English versions of policies, FAQs, and procedures.
Step 2: Generate embeddings per language
We use `text-embedding-ada-002` for both languages. While it works cross-lingually, we've found better results searching within the same language.
Step 3: Language-aware retrieval
When a user asks a question in French, we:
1. Detect the language
2. Search the French document index first
3. Fall back to English if no good matches (with translation)
Index Structure in Azure AI Search
Our search index includes:
- •`content_en` - English text chunks
- •`content_fr` - French text chunks
- •`embedding_en` - Vector embedding for English
- •`embedding_fr` - Vector embedding for French
- •`metadata` - Source document, last updated, category
Conversation History with Cosmos DB
Maintaining conversation context is critical for natural dialogue. We store:
Conversation document structure:
{
"id": "conv_abc123",
"userId": "user_xyz",
"sessionId": "sess_456",
"language": "fr",
"messages": [
{"role": "user", "content": "...", "timestamp": "..."},
{"role": "assistant", "content": "...", "timestamp": "..."}
],
"context": {
"lastTopic": "billing",
"accountId": "...",
"escalated": false
}
}
Partition strategy: We partition by `userId` for efficient retrieval and even distribution.
TTL policy: Conversations expire after 30 days for privacy compliance.
Cost Breakdown (Real Numbers)
For a mid-sized deployment handling 1,000 conversations per day:
Monthly Costs (CAD)
Azure OpenAI:
- •GPT-4 Turbo (complex queries, ~20%): $450-600
- •GPT-3.5 Turbo (routine queries, ~80%): $150-200
- •Embeddings (ada-002): $50-100
Azure AI Search:
- •Standard tier (S1): ~$350
Azure Cosmos DB:
- •400 RU/s provisioned: ~$35
- •Storage (50GB): ~$15
Azure Bot Service:
- •Standard channels: Free
- •Premium channels (if needed): ~$700
Azure Functions:
- •Consumption plan: $20-50
Total: $1,100 - $2,000 CAD/month
This scales roughly linearly with conversation volume up to about 10,000 daily conversations, after which you'd consider reserved capacity.
Security Architecture
Enterprise deployments require strict security controls:
Network Security
Private Endpoints - All Azure services communicate over private network, not public internet.
VNet Integration - Bot Service and Functions deployed within VNet.
Application Gateway - WAF protection for public-facing endpoints.
Identity & Access
Managed Identities - No secrets in code; services authenticate using Azure AD.
RBAC - Principle of least privilege for all service accounts.
Customer data isolation - Cosmos DB row-level security by tenant.
Data Protection
Data residency - All resources in Canada Central or Canada East.
Encryption - At rest (platform-managed keys) and in transit (TLS 1.2+).
PII handling - Conversation logs anonymized; no PII in prompts.
Human Escalation Patterns
No AI chatbot should operate without human escalation paths. We implement:
Confidence thresholds - Low-confidence responses trigger human review.
Topic-based routing - Sensitive topics (complaints, legal) go straight to humans.
User-initiated escalation - "Talk to a human" always works.
Time-based escalation - Conversations over 10 minutes without resolution escalate.
Monitoring & Observability
Production chatbots need comprehensive monitoring:
Application Insights:
- •Request latency (target: <3 seconds end-to-end)
- •Error rates by conversation stage
- •Token usage by model and query type
Custom metrics:
- •Resolution rate (% of conversations resolved without escalation)
- •Language distribution
- •Top topics by volume
- •User satisfaction (post-conversation survey)
Key Lessons Learned
1. Canadian French matters. European French models and translations often miss the mark. Test with Quebec users.
2. Start with RAG, not fine-tuning. RAG is more auditable and easier to update. Fine-tuning is a last resort.
3. Design for failure. The chatbot will get things wrong. Make escalation seamless.
4. Monitor token costs closely. Set up alerts for unusual usage patterns.
5. Plan for updates. Your knowledge base will change. Automate the re-indexing pipeline.
Building bilingual enterprise chatbots is complex, but Azure provides all the building blocks. The key is thoughtful architecture that respects both the technical requirements and the cultural context of Canadian bilingualism.
