Azure Data Collection Rules: Ingestion-Time PII Masking in Log Analytics
Azure Data Collection Rules: Ingestion-Time PII Masking in Log Analytics
At Christie's we had a classic problem: application logs contained client emails, phone numbers, and occasionally bid values. All of this flowed into a Log Analytics workspace that the entire platform team had read access to – meaning people who had no legitimate reason to see specific PII. The customer's SOC noticed during a compliance audit and the deadline to fix it was end of January.
The textbook answer would be "rewrite the apps to stop logging that". Reality? Thirty legacy apps, six teams, six priorities – so the answer became ingestion-time transformation via Data Collection Rules.
Architecture: Where DCR Sits in the Pipeline
[Application] → [AMA / Logs Ingestion API] → [DCR transformation] → [Log Analytics table]
↑
Mask PII hereThe key properties that made us pick DCR over a logstash/splunk forwarder:
- Native to Azure – no extra server component to operate
- Per-table scope – different rules for
AppTracesandAppExceptions - DCR changes audited via Azure Activity Log – any transformation edit is visible
- No schema impact – the transformation runs before write, the table schema stays stable
Example 1: Masking Email Addresses
The first rule we deployed was the simplest. Find an email in the Message column and replace the local part with asterisks.
source
| extend Message = replace(
@'([a-zA-Z0-9._%+-]+)(@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})',
@'***\2',
Message
)Heads up: replace_regex() is not available inside DCR transformations. You must use replace(), which in the DCR variant does accept regex (contrary to parts of the public docs). This was the most painful part of onboarding – KQL inside DCR behaves differently from KQL in regular queries.
Once deployed, an email like martin.rylko@example.com shows up in Log Analytics as ***@example.com. The domain stays for debugging, the local part is gone. For forensic needs we keep a parallel pipeline (see section 6).
Example 2: Full Removal of Card Numbers
For PCI data masking is not enough – the value must be removed. We use an empty replacement:
source
| extend Message = replace(
@'\b(?:\d[ -]*?){13,19}\b',
'[CARD_REDACTED]',
Message
)The regex covers 13–19 digit sequences with optional spaces or dashes, which catches Visa, Mastercard, AMEX, and weirder formats.
Example 3: Named Entities From the Application Layer
Some data is logged in a structured form by the app – e.g. clientId=CL-12345. For these we can afford to hash instead of mask, preserving the ability to aggregate:
source
| extend ClientHash = case(
Message has 'clientId=',
hash_sha256(extract(@'clientId=([A-Z0-9-]+)', 1, Message)),
''
)
| extend Message = replace(@'clientId=[A-Z0-9-]+', 'clientId=[HASHED]', Message)
| project-away ClientHashThe trick is the case() – if a row does not contain clientId=, the hash is never computed. That dramatically reduces CPU overhead on the agent.
Deploying a DCR via Bicep
Manual setup through the Portal is fine for the first lab. For production we use Bicep:
resource dcr 'Microsoft.Insights/dataCollectionRules@2023-03-11' = {
name: 'dcr-pii-masking-prod'
location: location
kind: 'Direct'
properties: {
streamDeclarations: {
'Custom-AppTraces': {
columns: [
{ name: 'TimeGenerated', type: 'datetime' }
{ name: 'Message', type: 'string' }
{ name: 'SeverityLevel', type: 'int' }
{ name: 'AppRoleName', type: 'string' }
]
}
}
destinations: {
logAnalytics: [
{
workspaceResourceId: workspaceId
name: 'la-destination'
}
]
}
dataFlows: [
{
streams: ['Custom-AppTraces']
destinations: ['la-destination']
transformKql: '''
source
| extend Message = replace(@'([a-zA-Z0-9._%+-]+)(@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,})', @'***\\2', Message)
| extend Message = replace(@'\\b(?:\\d[ -]*?){13,19}\\b', '[CARD_REDACTED]', Message)
'''
outputStream: 'Microsoft-AppTraces'
}
]
}
}Three things we got burned by at Christie's that are worth calling out:
outputStreammust match the destination table – forAppTracesit isMicrosoft-AppTraces, not a custom name- Backslash escaping in Bicep – inside a multiline string literal (
'''...''') you must double the backslashes, or the Bicep parser breaks the regex - AMA needs a DCR association, not a direct assignment – you have to separately create a
Microsoft.Insights/dataCollectionRuleAssociations
Lab Pattern: Validating a Transformation Before Production
This is my single most valuable tip. Microsoft provides the equivalent of Invoke-AzOperationalInsightsQuery for DCR transforms – the DCR sandbox in the Azure Portal (Monitor → Data Collection Rules → select DCR → Transformation editor → "Run").
Workflow:
- Take 50 real lines of production log (sanitized export)
- Wrap them in the JSON format the stream expects
- Run the transformation in the editor
- Check that no line leaks full PII
- Diff against a baseline for regression checks
At Christie's we hung this in CI – a PR to the DCR Bicep file triggers a unit test that takes a 200-line fixture and asserts the absence of regex matches on original PII patterns in the output.
Compliance: What It Actually Solves (and Does Not)
| Requirement | Does DCR transformation help? |
|---|---|
| GDPR Art. 32 (pseudonymization) | Yes – masked data is not "personal" under GDPR |
| GDPR Art. 17 (right to erasure) | Partially – fine for the main workspace, the forensic pipeline must be handled separately |
| PCI DSS Req. 3.4 | Yes – [CARD_REDACTED] is sufficient masking |
| NIS2 Art. 21 (logging) | Yes – you still keep full audit, just without PII |
| Customer ask "don't show clients to support" | Yes – read RBAC sees masked data |
What DCR does not solve: if an application sends TrackException to Application Insights with a full stack trace containing PII, the transformation on AppExceptions must explicitly cover OuterMessage and Details too. Easy to forget.
Conclusion
DCR transformations are, in January 2026, the cleanest way in Azure Monitor to satisfy a "PII never reaches log storage in full form" requirement without rewriting applications. Cost: 30–40 engineering hours for the first rule set, +10% CPU on agents, +10 seconds of ingest latency.
Value: a compliance audit with no findings and a SOC team that no longer gets paged because of an email in a log.
If you are dealing with a similar PII-in-logs problem and want to roll out the DCR pattern, check out our cloud architecture services or reach out for a lab walkthrough.
About the author

Martin Rylko
Senior Cloud Architect & DevOps Engineer
14+ years in IT – from on-premises datacenters and Hyper-V clustering to cloud infrastructure on Microsoft Azure. I specialize in Landing Zones, IaC automation, Kubernetes and security compliance.
Frequently Asked Questions
What is a Data Collection Rule (DCR) and why use it for masking?▾
Does the full KQL language work inside DCR transformations?▾
What is the performance impact of DCR transformations on ingest?▾
Is the masked data in DCR really gone for good?▾
You might also like
Azure Private Endpoints Everywhere: Refactoring a Serverless Pipeline From APIM to PE-Only
A practical refactor of a serverless email pipeline from APIM-fronted architecture to a private-endpoint-only end state. Shared PE subnet, Function Apps Premium, Service Bus, and Log Analytics with no public surface.
ReadZero Trust Azure: Conditional Access Policy Design
Design Zero Trust identity architecture with Entra ID Conditional Access policies. MFA enforcement, device compliance, session controls, and named locations for Azure environments.
ReadNIS2 Azure Compliance: Checklist for Architects
NIS2 Azure compliance checklist with concrete steps: Azure Policy governance, Defender for Cloud CSPM, centralized logging, and Zero Trust identity.
Read