Martin Rylko
  • Services
  • Blog
  • About
  • Contact
  • Get in Touch
Martin Rylko

Senior Cloud Architect & DevOps Engineer. Specializing in Microsoft Azure, IaC, Cloud Security and AI.

Navigation

  • Services
  • Blog
  • About
  • Contact

Collaboration

Looking for an experienced architect for your Azure project? Get in touch.

rylko@cloudmasters.cz

© 2026 Martin Rylko. All rights reserved.

Built in the cloud. Deployed via Azure Static Web Apps.

Home/Blog/Azure Data Collection Rules: Ingestion-Time PII Masking in Log Analytics
All articlesČíst česky

Azure Data Collection Rules: Ingestion-Time PII Masking in Log Analytics

1/20/2026 5 min
#Azure#Log Analytics#Security#DCR#GDPR

Azure Data Collection Rules: Ingestion-Time PII Masking in Log Analytics

At Christie's we had a classic problem: application logs contained client emails, phone numbers, and occasionally bid values. All of this flowed into a Log Analytics workspace that the entire platform team had read access to – meaning people who had no legitimate reason to see specific PII. The customer's SOC noticed during a compliance audit and the deadline to fix it was end of January.

The textbook answer would be "rewrite the apps to stop logging that". Reality? Thirty legacy apps, six teams, six priorities – so the answer became ingestion-time transformation via Data Collection Rules.

Architecture: Where DCR Sits in the Pipeline

[Application] → [AMA / Logs Ingestion API] → [DCR transformation] → [Log Analytics table]
                                                       ↑
                                                Mask PII here

The key properties that made us pick DCR over a logstash/splunk forwarder:

  • Native to Azure – no extra server component to operate
  • Per-table scope – different rules for AppTraces and AppExceptions
  • DCR changes audited via Azure Activity Log – any transformation edit is visible
  • No schema impact – the transformation runs before write, the table schema stays stable

Example 1: Masking Email Addresses

The first rule we deployed was the simplest. Find an email in the Message column and replace the local part with asterisks.

source
| extend Message = replace(
    @'([a-zA-Z0-9._%+-]+)(@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})',
    @'***\2',
    Message
  )

Heads up: replace_regex() is not available inside DCR transformations. You must use replace(), which in the DCR variant does accept regex (contrary to parts of the public docs). This was the most painful part of onboarding – KQL inside DCR behaves differently from KQL in regular queries.

Once deployed, an email like martin.rylko@example.com shows up in Log Analytics as ***@example.com. The domain stays for debugging, the local part is gone. For forensic needs we keep a parallel pipeline (see section 6).

Example 2: Full Removal of Card Numbers

For PCI data masking is not enough – the value must be removed. We use an empty replacement:

source
| extend Message = replace(
    @'\b(?:\d[ -]*?){13,19}\b',
    '[CARD_REDACTED]',
    Message
  )

The regex covers 13–19 digit sequences with optional spaces or dashes, which catches Visa, Mastercard, AMEX, and weirder formats.

Example 3: Named Entities From the Application Layer

Some data is logged in a structured form by the app – e.g. clientId=CL-12345. For these we can afford to hash instead of mask, preserving the ability to aggregate:

source
| extend ClientHash = case(
    Message has 'clientId=',
    hash_sha256(extract(@'clientId=([A-Z0-9-]+)', 1, Message)),
    ''
  )
| extend Message = replace(@'clientId=[A-Z0-9-]+', 'clientId=[HASHED]', Message)
| project-away ClientHash

The trick is the case() – if a row does not contain clientId=, the hash is never computed. That dramatically reduces CPU overhead on the agent.

Deploying a DCR via Bicep

Manual setup through the Portal is fine for the first lab. For production we use Bicep:

resource dcr 'Microsoft.Insights/dataCollectionRules@2023-03-11' = {
  name: 'dcr-pii-masking-prod'
  location: location
  kind: 'Direct'
  properties: {
    streamDeclarations: {
      'Custom-AppTraces': {
        columns: [
          { name: 'TimeGenerated', type: 'datetime' }
          { name: 'Message',       type: 'string'   }
          { name: 'SeverityLevel', type: 'int'      }
          { name: 'AppRoleName',   type: 'string'   }
        ]
      }
    }
    destinations: {
      logAnalytics: [
        {
          workspaceResourceId: workspaceId
          name: 'la-destination'
        }
      ]
    }
    dataFlows: [
      {
        streams: ['Custom-AppTraces']
        destinations: ['la-destination']
        transformKql: '''
          source
          | extend Message = replace(@'([a-zA-Z0-9._%+-]+)(@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,})', @'***\\2', Message)
          | extend Message = replace(@'\\b(?:\\d[ -]*?){13,19}\\b', '[CARD_REDACTED]', Message)
        '''
        outputStream: 'Microsoft-AppTraces'
      }
    ]
  }
}

Three things we got burned by at Christie's that are worth calling out:

  1. outputStream must match the destination table – for AppTraces it is Microsoft-AppTraces, not a custom name
  2. Backslash escaping in Bicep – inside a multiline string literal ('''...''') you must double the backslashes, or the Bicep parser breaks the regex
  3. AMA needs a DCR association, not a direct assignment – you have to separately create a Microsoft.Insights/dataCollectionRuleAssociations

Lab Pattern: Validating a Transformation Before Production

This is my single most valuable tip. Microsoft provides the equivalent of Invoke-AzOperationalInsightsQuery for DCR transforms – the DCR sandbox in the Azure Portal (Monitor → Data Collection Rules → select DCR → Transformation editor → "Run").

Workflow:

  1. Take 50 real lines of production log (sanitized export)
  2. Wrap them in the JSON format the stream expects
  3. Run the transformation in the editor
  4. Check that no line leaks full PII
  5. Diff against a baseline for regression checks

At Christie's we hung this in CI – a PR to the DCR Bicep file triggers a unit test that takes a 200-line fixture and asserts the absence of regex matches on original PII patterns in the output.

Compliance: What It Actually Solves (and Does Not)

RequirementDoes DCR transformation help?
GDPR Art. 32 (pseudonymization)Yes – masked data is not "personal" under GDPR
GDPR Art. 17 (right to erasure)Partially – fine for the main workspace, the forensic pipeline must be handled separately
PCI DSS Req. 3.4Yes – [CARD_REDACTED] is sufficient masking
NIS2 Art. 21 (logging)Yes – you still keep full audit, just without PII
Customer ask "don't show clients to support"Yes – read RBAC sees masked data

What DCR does not solve: if an application sends TrackException to Application Insights with a full stack trace containing PII, the transformation on AppExceptions must explicitly cover OuterMessage and Details too. Easy to forget.

Conclusion

DCR transformations are, in January 2026, the cleanest way in Azure Monitor to satisfy a "PII never reaches log storage in full form" requirement without rewriting applications. Cost: 30–40 engineering hours for the first rule set, +10% CPU on agents, +10 seconds of ingest latency.

Value: a compliance audit with no findings and a SOC team that no longer gets paged because of an email in a log.

If you are dealing with a similar PII-in-logs problem and want to roll out the DCR pattern, check out our cloud architecture services or reach out for a lab walkthrough.

Tags:#Azure#Log Analytics#Security#DCR#GDPR
LinkedInX / Twitter

About the author

Martin Rylko

Martin Rylko

Senior Cloud Architect & DevOps Engineer

14+ years in IT – from on-premises datacenters and Hyper-V clustering to cloud infrastructure on Microsoft Azure. I specialize in Landing Zones, IaC automation, Kubernetes and security compliance.

Email LinkedInFull profile

Frequently Asked Questions

What is a Data Collection Rule (DCR) and why use it for masking?▾
A Data Collection Rule is a transformation layer between a log source and the Log Analytics workspace. It lets you filter, enrich, and mask data at the moment of ingestion – before it is written to a table. For PII this means sensitive values (emails, card numbers, client names) never reach storage in full form, so nobody can extract them even with full permissions.
Does the full KQL language work inside DCR transformations?▾
No. DCR transformations support only a subset of KQL – specifically extend, project, where, parse, and a handful of scalar functions like replace, substring, and strcat. They do not support join, summarize, or replace_regex. This is the most common pitfall when migrating rules from scheduled queries – you must rewrite regex transforms as a combination of replace and extract.
What is the performance impact of DCR transformations on ingest?▾
In my lab tests a medium-complexity transformation (5 columns, two replace-based masks) added roughly 8–12 seconds of ingest latency and about 3–5% CPU overhead on the Azure Monitor Agent. For typical volumes (tens of GB/day) the impact is negligible. For high-volume ingest (1+ TB/day) I recommend evaluating an Event Hubs preprocessor instead.
Is the masked data in DCR really gone for good?▾
Yes, and that is the whole point. A value rewritten in a DCR transformation never reaches Log Analytics – not in the audit log, not in an _Original field. DCR does not store the originals. If you occasionally need the originals for forensics, you must duplicate the pipeline: one DCR masks and sends to the main workspace, another sends raw data to an isolated workspace with Customer-Managed Keys and strict RBAC.

You might also like

Azure Private Endpoints Everywhere: Refactoring a Serverless Pipeline From APIM to PE-Only

A practical refactor of a serverless email pipeline from APIM-fronted architecture to a private-endpoint-only end state. Shared PE subnet, Function Apps Premium, Service Bus, and Log Analytics with no public surface.

Read

Zero Trust Azure: Conditional Access Policy Design

Design Zero Trust identity architecture with Entra ID Conditional Access policies. MFA enforcement, device compliance, session controls, and named locations for Azure environments.

Read

NIS2 Azure Compliance: Checklist for Architects

NIS2 Azure compliance checklist with concrete steps: Azure Policy governance, Defender for Cloud CSPM, centralized logging, and Zero Trust identity.

Read