Why migrate from Calico to Cilium on AKS?

Three main reasons: performance (Cilium with eBPF has 2–3× faster conntrack and lower CPU overhead on network policy enforcement), identity-aware policies (allowing policies based on ServiceAccount identity, not just IP/label), and the future – since Build 2026 Microsoft has made Cilium the default CNI for new AKS clusters, so Calico support will gradually become legacy. In my Creditas tests we cut worker node CPU by 8–12% post-migration.

Can I migrate an existing AKS cluster, or do I have to create a new one?

In-place migration is possible but requires recreating every node pool. There is no "az aks update --network-policy cilium" – AKS does not support that. You have to create new node pools with Cilium, drain the old ones, and remove them. For most production clusters it is cleaner to create a new cluster with Cilium and migrate workloads via GitOps. At Creditas we picked the new-cluster approach; the full cutover took 6 weeks of parallel running.

Do existing NetworkPolicy YAML manifests work with Cilium?

Yes, Cilium fully supports the Kubernetes NetworkPolicy API – existing manifests work unchanged. Cilium also adds the CiliumNetworkPolicy CRD with extended capabilities (L7 HTTP filtering, identity-based policies, FQDN matching). Migrating existing policies is therefore straightforward, and you can layer CiliumNetworkPolicy on top where standard NetworkPolicy is not enough.

What is the real performance difference between Cilium and Calico?

In our Creditas production load (8 nodes, ~200 pods, ~30 000 connections/s peak) Cilium showed 35% lower p99 latency on service-to-service calls and 10% lower CPU usage on worker nodes. Conntrack table efficiency was higher – where Calico started allocating extra memory for connections, Cilium kept it stable. Details and graphs are in our internal performance write-up linked below.

AKS Cilium NetworkPolicy: Migrating From Calico Without Production Downtime

When Microsoft announced Azure CNI Powered by Cilium as the default for new AKS clusters at Build 2026, it opened a question we had been deferring for a year at Creditas: when and how to migrate existing production clusters off Calico. This article is the distilled version of what we did, where we got stuck, and what I would do differently next time.

Why Cilium and Not Calico

Calico works fine on AKS, but architecturally it sits on top of iptables. That means:

CPU overhead grows linearly with the number of network policies – above 50 policies it starts to show
The conntrack table is a single bottleneck at high throughput
L7 filtering requires a separate sidecar (Envoy via Calico Enterprise)

Cilium uses eBPF programs directly in the kernel instead of iptables. The result:

Aspect	Calico (iptables)	Cilium (eBPF)
Network policy enforcement	iptables chain traversal	eBPF program in the kernel
L7 HTTP filtering	Requires Envoy sidecar	Built-in
Identity-based policies	No	Yes (via ServiceAccount)
FQDN-based policies	No	Yes
Conntrack	Single kernel table	Per-cilium BPF map
Pod-to-pod overhead	~10–15% CPU at 1k pps	~3–5% CPU at 1k pps
Multi-cluster mesh	Calico Enterprise (paid)	Cilium Cluster Mesh (open source)

In my Creditas measurements (8 nodes, ~200 pods, ~30 000 connections/s peak) the result was clear:

p50 latency service-to-service:
  Calico:  1.8 ms
  Cilium:  1.2 ms  (-33%)
 
p99 latency:
  Calico:  18 ms
  Cilium:  11 ms   (-39%)
 
Worker node CPU (idle network policy load):
  Calico:  14% baseline
  Cilium:  8% baseline   (-43% relative)

Migration Architecture: New Cluster vs In-Place

There is no direct upgrade path from Calico to Cilium on AKS. You have two options:

Option A – in-place migration (officially supported since summer 2024):

az aks update --network-policy cilium – enables the Cilium control plane
Create new node pools with --enable-cilium-dataplane
Drain the old nodes
Remove the old node pools

The catch: every node pool must be recreated. For a cluster with 10 specialized pools that is three weeks of careful operations. And during migration Cilium and Calico run side by side, which adds unexpected complexity.

Option B – parallel new cluster (what we did at Creditas):

Provision a new AKS cluster with Cilium from the start
GitOps (Flux) copies workloads into the new cluster
Gradual traffic cutover (DNS + Front Door)
Delete the old cluster

At Creditas the cutover took 6 weeks with zero production downtime. I recommend option B for anyone running GitOps – it is cleaner and reversible.

New Cluster With Cilium: Bicep Template

resource aks 'Microsoft.ContainerService/managedClusters@2024-09-01' = {
  name: 'aks-prod-cilium'
  location: location
  identity: { type: 'SystemAssigned' }
  sku: {
    name: 'Base'
    tier: 'Standard'
  }
  properties: {
    kubernetesVersion: '1.31.0'
    dnsPrefix: 'aksprodcil'
    networkProfile: {
      networkPlugin: 'azure'
      networkPluginMode: 'overlay'
      networkDataplane: 'cilium'      // KEY – Cilium dataplane
      networkPolicy: 'cilium'          // Cilium NetworkPolicy enforcement
      loadBalancerSku: 'standard'
      serviceCidr: '10.0.0.0/16'
      dnsServiceIP: '10.0.0.10'
      podCidr: '10.244.0.0/16'        // overlay pod CIDR
    }
    agentPoolProfiles: [
      {
        name: 'system'
        count: 3
        vmSize: 'Standard_D4s_v5'
        osSKU: 'AzureLinux'
        mode: 'System'
        availabilityZones: ['1', '2', '3']
      }
    ]
    addonProfiles: {
      azureKeyvaultSecretsProvider: {
        enabled: true
        config: { enableSecretRotation: 'true' }
      }
    }
    securityProfile: {
      workloadIdentity: { enabled: true }
    }
    oidcIssuerProfile: { enabled: true }
  }
}

Three critical properties:

networkDataplane: 'cilium' – activates the Cilium eBPF dataplane (vs 'azure' = iptables)
networkPolicy: 'cilium' – must be cilium (not calico, not empty)
networkPluginMode: 'overlay' – Cilium supports overlay or non-overlay; overlay is recommended for new clusters (decouples pod CIDR from VNet)

Migrating Existing NetworkPolicy

Good news: existing kind: NetworkPolicy manifests work unchanged. Cilium fully implements the Kubernetes NetworkPolicy API.

# Existing Calico policy – works on Cilium unchanged
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-allow-frontend
  namespace: prod
spec:
  podSelector:
    matchLabels: { app: api }
  policyTypes: [Ingress]
  ingress:
  - from:
    - podSelector:
        matchLabels: { app: frontend }
    ports:
    - protocol: TCP
      port: 8080

The migration check is trivial – kubectl apply to the new cluster and cilium policy get from cilium-cli.

Adding CiliumNetworkPolicy for Advanced Use Cases

This is where it gets fun. Standard NetworkPolicy is L3/L4 (IP + port). Cilium adds L7 (HTTP method, path, headers):

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: api-l7-restrictions
  namespace: prod
spec:
  endpointSelector:
    matchLabels: { app: api }
  ingress:
  - fromEndpoints:
    - matchLabels: { app: frontend }
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP
      rules:
        http:
        - method: GET
          path: "/api/v1/.*"
        - method: POST
          path: "/api/v1/items"
          headers:
          - "X-Tenant-ID: .+"

What this policy enforces:

The frontend can call the API over HTTP
Only GET /api/v1/* and POST /api/v1/items
POST must carry an X-Tenant-ID header (multi-tenancy)
Everything else (PUT, DELETE, other paths) is blocked at the kernel level, not the application

No application change, no Envoy sidecar, no API gateway. The eBPF program in the kernel makes the decision at the L7 layer.

Identity-Aware Policies (Game Changer)

The second Cilium killer feature for us at Creditas: policies by ServiceAccount identity instead of pod labels.

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: db-access-by-sa
  namespace: prod
spec:
  endpointSelector:
    matchLabels: { app: postgres }
  ingress:
  - fromEndpoints:
    - matchLabels:
        # Cilium-specific: ServiceAccount as identity
        io.cilium.k8s.policy.serviceaccount: api-sa
    toPorts:
    - ports: [{ port: "5432", protocol: TCP }]

Why it matters: pod labels can be spoofed (compromised pod, RBAC hole). ServiceAccount identity is bound to the Kubernetes auth subsystem and cannot be spoofed from a compromised pod. For a regulated workload (PCI, GDPR) this is a fundamental difference.

Cutover Plan: 6 Weeks at Creditas

Week	Activity	Risk
1	Provision new cluster with Cilium, dry-run GitOps sync	None
2	Sync all namespaces, smoke tests, performance baseline	None
3	Cutover dev/test traffic via Front Door	Low
4	Cutover staging traffic, integration test suite	Low
5	Canary 10% of production traffic	Medium
6	Full cutover, monitoring, delete old cluster	Low

Key enabler: Front Door routing rules with percentage-based split allowed granular cutover without DNS TTL pain. If you do not use Front Door, the same strategy works via Application Gateway or an external load balancer.

Three Traps We Got Stuck In

CoreDNS in a Cilium cluster does not enable NodeLocal DNS Cache automatically – we had it enabled in the existing cluster, not in the new one. Detected after a week – some DNS lookups were 5–8 ms slower. Fix: az aks update --enable-node-local-dns
Cilium Hubble (observability) is not on by default – you must explicitly enable --enable-hubble. Without Hubble you lose flow visibility, which makes debugging policy issues much harder
A CiliumNetworkPolicy syntax error blocks the deploy – Calico is more lenient. CiliumNetworkPolicy validation is strict – any CRD error fails the deployment. I recommend cilium policy validate as a pre-commit hook

Conclusion

Migrating AKS from Calico to Cilium is not trivial, but 2026 is the year it pays off. A 35% reduction in p99 latency and a 10% reduction in worker node CPU justify a 6-week migration in any environment with serious traffic. CiliumNetworkPolicy with L7 and identity-aware filtering opens use cases we used to handle with Istio sidecars.

If you are planning a similar migration or a fresh AKS cluster in 2026, check out our cloud architecture services or reach out for a Cilium migration walkthrough.

AKS Cilium NetworkPolicy: Migrating From Calico Without Production Downtime

Why Cilium and Not Calico

Calico works fine on AKS, but architecturally it sits on top of iptables. That means:

CPU overhead grows linearly with the number of network policies – above 50 policies it starts to show
The conntrack table is a single bottleneck at high throughput
L7 filtering requires a separate sidecar (Envoy via Calico Enterprise)

Cilium uses eBPF programs directly in the kernel instead of iptables. The result:

Aspect	Calico (iptables)	Cilium (eBPF)
Network policy enforcement	iptables chain traversal	eBPF program in the kernel
L7 HTTP filtering	Requires Envoy sidecar	Built-in
Identity-based policies	No	Yes (via ServiceAccount)
FQDN-based policies	No	Yes
Conntrack	Single kernel table	Per-cilium BPF map
Pod-to-pod overhead	~10–15% CPU at 1k pps	~3–5% CPU at 1k pps
Multi-cluster mesh	Calico Enterprise (paid)	Cilium Cluster Mesh (open source)

In my Creditas measurements (8 nodes, ~200 pods, ~30 000 connections/s peak) the result was clear:

p50 latency service-to-service:
  Calico:  1.8 ms
  Cilium:  1.2 ms  (-33%)
 
p99 latency:
  Calico:  18 ms
  Cilium:  11 ms   (-39%)
 
Worker node CPU (idle network policy load):
  Calico:  14% baseline
  Cilium:  8% baseline   (-43% relative)

Migration Architecture: New Cluster vs In-Place

There is no direct upgrade path from Calico to Cilium on AKS. You have two options:

Option A – in-place migration (officially supported since summer 2024):

az aks update --network-policy cilium – enables the Cilium control plane
Create new node pools with --enable-cilium-dataplane
Drain the old nodes
Remove the old node pools

Option B – parallel new cluster (what we did at Creditas):

Provision a new AKS cluster with Cilium from the start
GitOps (Flux) copies workloads into the new cluster
Gradual traffic cutover (DNS + Front Door)
Delete the old cluster

At Creditas the cutover took 6 weeks with zero production downtime. I recommend option B for anyone running GitOps – it is cleaner and reversible.

New Cluster With Cilium: Bicep Template

resource aks 'Microsoft.ContainerService/managedClusters@2024-09-01' = {
  name: 'aks-prod-cilium'
  location: location
  identity: { type: 'SystemAssigned' }
  sku: {
    name: 'Base'
    tier: 'Standard'
  }
  properties: {
    kubernetesVersion: '1.31.0'
    dnsPrefix: 'aksprodcil'
    networkProfile: {
      networkPlugin: 'azure'
      networkPluginMode: 'overlay'
      networkDataplane: 'cilium'      // KEY – Cilium dataplane
      networkPolicy: 'cilium'          // Cilium NetworkPolicy enforcement
      loadBalancerSku: 'standard'
      serviceCidr: '10.0.0.0/16'
      dnsServiceIP: '10.0.0.10'
      podCidr: '10.244.0.0/16'        // overlay pod CIDR
    }
    agentPoolProfiles: [
      {
        name: 'system'
        count: 3
        vmSize: 'Standard_D4s_v5'
        osSKU: 'AzureLinux'
        mode: 'System'
        availabilityZones: ['1', '2', '3']
      }
    ]
    addonProfiles: {
      azureKeyvaultSecretsProvider: {
        enabled: true
        config: { enableSecretRotation: 'true' }
      }
    }
    securityProfile: {
      workloadIdentity: { enabled: true }
    }
    oidcIssuerProfile: { enabled: true }
  }
}

Three critical properties:

networkDataplane: 'cilium' – activates the Cilium eBPF dataplane (vs 'azure' = iptables)
networkPolicy: 'cilium' – must be cilium (not calico, not empty)
networkPluginMode: 'overlay' – Cilium supports overlay or non-overlay; overlay is recommended for new clusters (decouples pod CIDR from VNet)

Migrating Existing NetworkPolicy

Good news: existing kind: NetworkPolicy manifests work unchanged. Cilium fully implements the Kubernetes NetworkPolicy API.

# Existing Calico policy – works on Cilium unchanged
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-allow-frontend
  namespace: prod
spec:
  podSelector:
    matchLabels: { app: api }
  policyTypes: [Ingress]
  ingress:
  - from:
    - podSelector:
        matchLabels: { app: frontend }
    ports:
    - protocol: TCP
      port: 8080

The migration check is trivial – kubectl apply to the new cluster and cilium policy get from cilium-cli.

Adding CiliumNetworkPolicy for Advanced Use Cases

This is where it gets fun. Standard NetworkPolicy is L3/L4 (IP + port). Cilium adds L7 (HTTP method, path, headers):

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: api-l7-restrictions
  namespace: prod
spec:
  endpointSelector:
    matchLabels: { app: api }
  ingress:
  - fromEndpoints:
    - matchLabels: { app: frontend }
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP
      rules:
        http:
        - method: GET
          path: "/api/v1/.*"
        - method: POST
          path: "/api/v1/items"
          headers:
          - "X-Tenant-ID: .+"

What this policy enforces:

The frontend can call the API over HTTP
Only GET /api/v1/* and POST /api/v1/items
POST must carry an X-Tenant-ID header (multi-tenancy)
Everything else (PUT, DELETE, other paths) is blocked at the kernel level, not the application

No application change, no Envoy sidecar, no API gateway. The eBPF program in the kernel makes the decision at the L7 layer.

Identity-Aware Policies (Game Changer)

The second Cilium killer feature for us at Creditas: policies by ServiceAccount identity instead of pod labels.

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: db-access-by-sa
  namespace: prod
spec:
  endpointSelector:
    matchLabels: { app: postgres }
  ingress:
  - fromEndpoints:
    - matchLabels:
        # Cilium-specific: ServiceAccount as identity
        io.cilium.k8s.policy.serviceaccount: api-sa
    toPorts:
    - ports: [{ port: "5432", protocol: TCP }]

Cutover Plan: 6 Weeks at Creditas

Week	Activity	Risk
1	Provision new cluster with Cilium, dry-run GitOps sync	None
2	Sync all namespaces, smoke tests, performance baseline	None
3	Cutover dev/test traffic via Front Door	Low
4	Cutover staging traffic, integration test suite	Low
5	Canary 10% of production traffic	Medium
6	Full cutover, monitoring, delete old cluster	Low

Three Traps We Got Stuck In

CoreDNS in a Cilium cluster does not enable NodeLocal DNS Cache automatically – we had it enabled in the existing cluster, not in the new one. Detected after a week – some DNS lookups were 5–8 ms slower. Fix: az aks update --enable-node-local-dns
Cilium Hubble (observability) is not on by default – you must explicitly enable --enable-hubble. Without Hubble you lose flow visibility, which makes debugging policy issues much harder
A CiliumNetworkPolicy syntax error blocks the deploy – Calico is more lenient. CiliumNetworkPolicy validation is strict – any CRD error fails the deployment. I recommend cilium policy validate as a pre-commit hook

Conclusion

If you are planning a similar migration or a fresh AKS cluster in 2026, check out our cloud architecture services or reach out for a Cilium migration walkthrough.

AKS Cilium NetworkPolicy: Migrating From Calico Without Production Downtime

AKS Cilium NetworkPolicy: Migrating From Calico Without Production Downtime

Why Cilium and Not Calico

Migration Architecture: New Cluster vs In-Place

New Cluster With Cilium: Bicep Template

Migrating Existing NetworkPolicy

Adding CiliumNetworkPolicy for Advanced Use Cases

Identity-Aware Policies (Game Changer)

Cutover Plan: 6 Weeks at Creditas

Three Traps We Got Stuck In

Conclusion

About the author

Frequently Asked Questions

AKS Breaking Changes: What Is Retiring in March 2026 and How to Migrate

Azure Container Apps vs AKS: A 2026 Decision Matrix

Kubernetes AKS Production Checklist for Architects

AKS Cilium NetworkPolicy: Migrating From Calico Without Production Downtime

AKS Cilium NetworkPolicy: Migrating From Calico Without Production Downtime

Why Cilium and Not Calico

Migration Architecture: New Cluster vs In-Place

New Cluster With Cilium: Bicep Template

Migrating Existing NetworkPolicy

Adding CiliumNetworkPolicy for Advanced Use Cases

Identity-Aware Policies (Game Changer)

Cutover Plan: 6 Weeks at Creditas

Three Traps We Got Stuck In

Conclusion

About the author

Frequently Asked Questions

AKS Breaking Changes: What Is Retiring in March 2026 and How to Migrate

Azure Container Apps vs AKS: A 2026 Decision Matrix

Kubernetes AKS Production Checklist for Architects

About the author

Frequently Asked Questions

You might also like

AKS Breaking Changes: What Is Retiring in March 2026 and How to Migrate

Azure Container Apps vs AKS: A 2026 Decision Matrix

Kubernetes AKS Production Checklist for Architects

About the author

Frequently Asked Questions

You might also like

AKS Breaking Changes: What Is Retiring in March 2026 and How to Migrate

Azure Container Apps vs AKS: A 2026 Decision Matrix

Kubernetes AKS Production Checklist for Architects