Martin Rylko
  • Services
  • Blog
  • About
  • Contact
  • Get in Touch
Martin Rylko

Senior Cloud Architect & DevOps Engineer. Specializing in Microsoft Azure, IaC, Cloud Security and AI.

Navigation

  • Services
  • Blog
  • About
  • Contact

Collaboration

Looking for an experienced architect for your Azure project? Get in touch.

rylko@cloudmasters.cz

© 2026 Martin Rylko. All rights reserved.

Built in the cloud. Deployed via Azure Static Web Apps.

Home/Blog/AKS for Production: A Checklist for Cloud Architects
All articlesČíst česky

AKS for Production: A Checklist for Cloud Architects

2/5/2026 3 min
#Kubernetes#Azure#AKS#DevOps

AKS for Production: A Checklist for Cloud Architects

Azure Kubernetes Service (AKS) is fantastic for running containerized workloads, but the transition from dev/test environments to full-scale production requires careful preparation. This checklist is based on my real-world experience deploying AKS clusters for FinTech and enterprise clients.

Networking – The Foundation of Everything

Azure CNI vs Kubenet

For production, always use Azure CNI (or Azure CNI Overlay for larger clusters). Kubenet is fine for dev, but in production you need:

  • Direct pod IP addressing within the Azure VNet
  • Network Policy support (Calico/Azure)
  • Integration with Azure Private Link
resource aksCluster 'Microsoft.ContainerService/managedClusters@2024-02-01' = {
  name: 'aks-prod-westeurope'
  location: 'westeurope'
  properties: {
    networkProfile: {
      networkPlugin: 'azure'
      networkPolicy: 'calico'
      serviceCidr: '10.250.0.0/16'
      dnsServiceIP: '10.250.0.10'
      loadBalancerSku: 'standard'
      outboundType: 'userDefinedRouting'
    }
    apiServerAccessProfile: {
      enablePrivateCluster: true
      privateDNSZone: 'system'
    }
  }
}

Private Cluster is Mandatory

A publicly accessible API server in production? Absolutely not. Private cluster + Azure Private DNS Zone + VPN/ExpressRoute for on-premises access.

Identity & RBAC

Workload Identity (not Pod Identity!)

Pod Identity is deprecated. We're moving to Workload Identity Federation:

  1. Create a Managed Identity in Azure
  2. Set up federated credentials for a Kubernetes Service Account
  3. The application automatically obtains Azure tokens without stored secrets

Kubernetes RBAC + Entra ID

# ClusterRoleBinding – only the Entra ID group has admin access
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: aks-cluster-admins
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: Group
    name: "00000000-0000-0000-0000-000000000000"  # Entra ID Group Object ID
    apiGroup: rbac.authorization.k8s.io

Scaling & High Availability

Node Pool Strategy

  • System pool: Minimum 3 nodes, dedicated for system pods (CoreDNS, kube-proxy)
  • User pool(s): Separate pools for different workloads, with taints and tolerations
  • Spot pool: For batch jobs and non-critical workloads (up to 90% savings)

Cluster Autoscaler + KEDA

Cluster Autoscaler for horizontal node scaling. KEDA for event-driven pod scaling based on metrics (Azure Service Bus queue depth, HTTP requests).

Monitoring & Observability

Required Stack

  1. Azure Monitor + Container Insights – node and pod metrics
  2. Prometheus + Grafana (managed via Azure Monitor) – custom dashboards
  3. Alerting – CPU/Memory node pools > 80%, pod restart count, OOMKilled events

Log Aggregation

All application logs into Log Analytics Workspace. Set retention to at least 90 days for compliance (NIS2).

Backup & Disaster Recovery

  • Velero or Azure Backup for AKS – backing up PVs and cluster state
  • Multi-region deployment – Active/Passive with Azure Traffic Manager or Front Door
  • GitOps (ArgoCD/Flux) – entire cluster state versioned in Git, recovery = git push

Production Checklist (Summary)

| Area | Requirement | Priority | |------|-------------|----------| | Networking | Azure CNI + Private Cluster | Critical | | Identity | Workload Identity + Entra ID RBAC | Critical | | Scaling | Cluster Autoscaler + min 3 system nodes | High | | Monitoring | Container Insights + alerting | High | | Security | Network Policy (Calico) + Pod Security Standards | High | | Backup | Velero/Azure Backup + GitOps | Medium | | Cost | Spot node pools + right-sizing | Medium |

Conclusion

AKS is a great platform, but it requires an architectural approach. Don't underestimate networking and identity – these two pillars determine whether your cluster will survive its first security audit.

Need help designing an AKS architecture for your project? I offer a free consultation.

Tags:#Kubernetes#Azure#AKS#DevOps
LinkedInX / Twitter

About the author

Martin Rylko

Martin Rylko

Senior Cloud Architect & DevOps Engineer

14+ years in IT – from on-premises datacenters and Hyper-V clustering to cloud infrastructure on Microsoft Azure. I specialize in Landing Zones, IaC automation, Kubernetes and security compliance.

Email LinkedInFull profile

You might also like

5 Terraform Best Practices for Production Azure Projects

Common mistakes and proven practices when working with Terraform in Azure – from state management to modularization and drift detection.

Read

NIS2 and Azure: A Practical Compliance Checklist for Architects

How to prepare your Azure environment for the NIS2 directive – concrete steps from Azure Policy through Defender for Cloud to logging and incident response.

Read

Building an Azure Landing Zone with Bicep

A practical guide on how to effectively structure your Bicep code for deploying an enterprise-ready Azure Landing Zone (ALZ).

Read