Application Insights Comprehensive Guide
Level : L300-400 Deep Dive | Last Updated : February 2026
Table of Contents
Overview
Architecture and Data Flow
Instrumentation Methods
Telemetry Data Model
Configuration Deep Dive
Sampling Strategies
Distributed Tracing
Alerting and Smart Detection
Performance Diagnostics
Cost Optimization
Security and Compliance
Well-Architected Framework Alignment
Production Readiness Checklist
References
Overview
Azure Monitor Application Insights is an OpenTelemetry-based Application Performance Monitoring (APM) service that provides comprehensive observability for live web applications. It integrates with OpenTelemetry (OTel) to provide a vendor-neutral approach to collecting and analyzing telemetry data.
Key Capabilities
Capability
Description
Application Performance Monitoring
Monitor response times, failure rates, and dependency performance
Distributed Tracing
End-to-end transaction tracking across microservices
Live Metrics
Real-time performance monitoring with ~1 second latency
Smart Detection
ML-powered anomaly detection for failures and performance degradation
Usage Analytics
User behavior analysis with funnels, flows, and cohorts
Code-Level Diagnostics
.NET Profiler and Snapshot Debugger for deep troubleshooting
Application Insights Experiences
graph TB
subgraph "Investigate"
DASH[Application Dashboard]
MAP[Application Map]
LIVE[Live Metrics]
SEARCH[Search View]
AVAIL[Availability View]
FAIL[Failures View]
PERF[Performance View]
end
subgraph "Monitoring"
ALERTS[Alerts]
METRICS[Metrics]
LOGS[Logs]
WORKBOOKS[Workbooks]
GRAFANA[Grafana Dashboards]
end
subgraph "Usage"
USERS[Users & Sessions]
FUNNELS[Funnels]
FLOWS[User Flows]
COHORTS[Cohorts]
end
subgraph "Code Analysis"
PROFILER[.NET Profiler]
SNAPSHOT[Snapshot Debugger]
CODE_OPT[Code Optimizations]
end
style DASH fill:#e3f2fd
style MAP fill:#e3f2fd
style ALERTS fill:#fff3e0
style PROFILER fill:#f3e5f5
Architecture and Data Flow
Logic Model
Application Insights follows a layered architecture for data collection, processing, and analysis.
flowchart TB
subgraph "Application Layer"
APP[Your Application]
SDK[OpenTelemetry SDK / Classic SDK]
AUTO[Auto-Instrumentation Agent]
end
subgraph "Data Collection"
CONN[Connection String]
ENDPOINT[Ingestion Endpoint]
end
subgraph "Azure Monitor Backend"
INGEST[Ingestion Pipeline]
PROCESS[Processing & Sampling]
LA[Log Analytics Workspace]
end
subgraph "Consumption"
PORTAL[Azure Portal]
API[REST API]
EXPORT[Data Export]
end
APP --> SDK
APP --> AUTO
SDK --> CONN
AUTO --> CONN
CONN --> ENDPOINT
ENDPOINT --> INGEST
INGEST --> PROCESS
PROCESS --> LA
LA --> PORTAL
LA --> API
LA --> EXPORT
style LA fill:#c8e6c9
style ENDPOINT fill:#fff3e0
Resource Topology
graph TB
subgraph "Azure Subscription"
subgraph "Resource Group"
AI[Application Insights Resource]
LA[Log Analytics Workspace]
end
end
subgraph "Data Sources"
WEB[Web Application]
API_APP[API Service]
FUNC[Azure Functions]
AKS[AKS Workloads]
end
WEB --> AI
API_APP --> AI
FUNC --> AI
AKS --> AI
AI --> LA
style AI fill:#e3f2fd
style LA fill:#c8e6c9
Key Architecture Decisions
Decision
Recommendation
Rationale
Resource per environment
One App Insights per workload per environment
Prevents mixing telemetry; enables environment-specific configurations
Regional alignment
Deploy in same region as Log Analytics workspace
Reduces latency and eliminates cross-region failure risks
Workspace-based
Always use workspace-based Application Insights
Enables cost optimization features (Basic Logs, commitment tiers)
Instrumentation Methods
Decision Matrix
Method
Code Changes
Languages
Best For
Auto-Instrumentation
None
.NET, Java, Node.js, Python
Quick setup, Azure-hosted apps
OpenTelemetry Distro
Minimal
.NET, Java, Node.js, Python
New projects, vendor neutrality
Classic SDK
Moderate
.NET, Node.js
Legacy applications
JavaScript SDK
Minimal
JavaScript/TypeScript
Client-side monitoring
OpenTelemetry Instrumentation (Recommended)
The Azure Monitor OpenTelemetry Distro is the recommended approach for new applications.
ASP.NET Core Example
// Program.cs
using Azure.Monitor.OpenTelemetry.AspNetCore ;
var builder = WebApplication . CreateBuilder ( args );
// Add OpenTelemetry and configure for Azure Monitor
builder . Services . AddOpenTelemetry (). UseAzureMonitor ( options =>
{
options . ConnectionString = builder . Configuration [ "APPLICATIONINSIGHTS_CONNECTION_STRING" ];
});
var app = builder . Build ();
app . Run ();
Java Auto-Instrumentation
# Add JVM argument to your application startup
# Download latest agent from: https://github.com/microsoft/ApplicationInsights-Java/releases
java -javaagent:"path/to/applicationinsights-agent-{VERSION}.jar" -jar your-app.jar
Note : Starting from Java agent 3.4.0+, rate-limited sampling is enabled by default at 5 requests per second. This aids in cost management but may cause missing telemetry in high-volume scenarios. See sampling configuration for details.
Node.js Example
// At the very top of your entry point file
const { useAzureMonitor } = require ( "@azure/monitor-opentelemetry" );
// Configure before any other imports
useAzureMonitor ();
// Rest of your application code
Python Example
# At the very top of your entry point file
from azure.monitor.opentelemetry import configure_azure_monitor
configure_azure_monitor ()
# Rest of your application code
Connection String Configuration
Method
Priority
Use Case
Code
1 (Highest)
Local development only
Environment Variable
2
Production (recommended)
Configuration File
3
Java applications
# Environment variable (recommended for production)
APPLICATIONINSIGHTS_CONNECTION_STRING = InstrumentationKey = xxx; IngestionEndpoint = https://xxx.in.applicationinsights.azure.com/
Platform
Enablement Method
Azure App Service
Portal toggle / ARM deployment
Azure Functions
Built-in integration
Azure VM / VMSS
VM extension
Azure Spring Apps
Configuration property
Azure Container Apps
Environment configuration
Azure Kubernetes Service
OTEL Collector / Sidecar
Service Limits Reference
Resource
Default Limit
Maximum Limit
Total data per day
100 GB
Contact support (up to 1,000 GB via portal)
Throttling
32,000 events/second
Contact support
Data retention (logs)
30-730 days
730 days
Data retention (metrics)
90 days
90 days
Maximum telemetry item size
64 KB
64 KB
Maximum telemetry items per batch
64,000
64,000
Property/metric name length
150 characters
150 characters
Property value string length
8,192 characters
8,192 characters
Trace/exception message length
32,768 characters
32,768 characters
Availability tests per resource
100
100
.NET Profiler/Snapshot Debugger retention
2 weeks
6 months (contact support)
Telemetry Data Model
Telemetry Types
graph LR
subgraph "Telemetry Types"
REQ[Requests]
DEP[Dependencies]
EXC[Exceptions]
TRACE[Traces]
METRIC[Metrics]
EVENT[Custom Events]
PV[Page Views]
AVAIL[Availability Results]
end
subgraph "Log Analytics Tables"
REQ --> T_REQ[AppRequests]
DEP --> T_DEP[AppDependencies]
EXC --> T_EXC[AppExceptions]
TRACE --> T_TRACE[AppTraces]
METRIC --> T_METRIC[AppMetrics / AppPerformanceCounters]
EVENT --> T_EVENT[AppEvents]
PV --> T_PV[AppPageViews]
AVAIL --> T_AVAIL[AppAvailabilityResults]
end
Telemetry Types Reference
Type
Table (Log Analytics)
Description
Auto-Collected
Request
AppRequests
Incoming HTTP requests
Yes
Dependency
AppDependencies
Outgoing calls (HTTP, SQL, etc.)
Yes
Exception
AppExceptions
Captured exceptions and errors
Yes
Trace
AppTraces
Log messages and diagnostic traces
Yes
Metric
AppMetrics
Custom and performance metrics
Partial
Event
AppEvents
Custom business events
No
Page View
AppPageViews
Browser page loads
Yes (JS SDK)
Availability
AppAvailabilityResults
Synthetic test results
Configured
Telemetry Correlation
Application Insights uses operation IDs to correlate telemetry across distributed systems.
sequenceDiagram
participant User
participant Frontend
participant API
participant Database
User->>Frontend: Request (operation_Id: abc123)
Frontend->>API: HTTP Call (operation_Id: abc123)
API->>Database: SQL Query (operation_Id: abc123)
Database-->>API: Response
API-->>Frontend: Response
Frontend-->>User: Response
Note over User,Database: All telemetry shares operation_Id for correlation
Key Correlation Fields
Field
Purpose
operation_Id
Unique identifier for the entire distributed trace
operation_ParentId
ID of the parent operation (for building call trees)
cloud_RoleName
Identifies the service/component in Application Map
cloud_RoleInstance
Identifies the specific instance (pod, VM, etc.)
Configuration Deep Dive
OpenTelemetry Configuration Options
ASP.NET Core Configuration
builder . Services . AddOpenTelemetry (). UseAzureMonitor ( options =>
{
// Connection string
options . ConnectionString = "<YOUR-CONNECTION-STRING>" ;
// Sampling configuration
options . SamplingRatio = 0.1F ; // 10% fixed-rate sampling
// OR
options . TracesPerSecond = 5.0 ; // Rate-limited sampling
// Enable/disable specific instrumentation
options . EnableLiveMetrics = true ;
});
Environment Variables
Variable
Description
APPLICATIONINSIGHTS_CONNECTION_STRING
Connection string for telemetry ingestion
APPLICATIONINSIGHTS_STATSBEAT_DISABLED
Disable internal metrics (true/false)
OTEL_SERVICE_NAME
Override the service name
OTEL_RESOURCE_ATTRIBUTES
Additional resource attributes
Cloud Role Name Configuration
Setting the Cloud Role Name is critical for proper Application Map visualization. The cloud role name uses the service.name resource attribute.
Option 1: Environment Variable (Recommended)
# Set via environment variable (works for all languages)
export OTEL_SERVICE_NAME = "my-api-service"
# Or with additional resource attributes
export OTEL_RESOURCE_ATTRIBUTES = "service.namespace=mycompany,service.version=1.0.0"
Option 2: Code Configuration (ASP.NET Core)
// ASP.NET Core - Configure via UseAzureMonitor options
builder . Services . AddOpenTelemetry (). UseAzureMonitor ();
// Configure resource attributes
builder . Services . ConfigureOpenTelemetryTracerProvider (( sp , tracerBuilder ) =>
{
tracerBuilder . ConfigureResource ( resourceBuilder =>
{
resourceBuilder . AddService (
serviceName : "my-api-service" ,
serviceVersion : "1.0.0" );
});
});
Option 3: Java Configuration
// Java - applicationinsights.json
{
"connectionString" : "<YOUR-CONNECTION-STRING>" ,
"role" : {
"name" : "my-api-service" ,
"instance" : "my-instance-id"
}
}
Note : If you have multiple services sending telemetry to the same Application Insights resource, you must set Cloud Role Names to distinguish them in the Application Map.
Java Standalone Agent Configuration
Create applicationinsights.json in the same directory as the agent JAR:
{
"connectionString" : "<YOUR-CONNECTION-STRING>" ,
"role" : {
"name" : "my-java-service"
},
"sampling" : {
"percentage" : 10
},
"instrumentation" : {
"logging" : {
"level" : "WARN"
}
},
"preview" : {
"sampling" : {
"overrides" : [
{
"telemetryType" : "request" ,
"attributes" : [
{
"key" : "http.url" ,
"value" : "https?://[^/]+/health.*" ,
"matchType" : "regexp"
}
],
"percentage" : 0
}
]
}
}
}
Sampling Strategies
Why Sampling Matters
Sampling is essential for managing costs and preventing throttling in high-volume applications.
Without Sampling
With Sampling
High storage costs
Controlled costs
Potential throttling (32,000 events/second, measured over a minute)
Stays within limits
Full data retention
Statistically representative data
Sampling Types
flowchart LR
subgraph "Client-Side Sampling"
FIXED[Fixed-Rate Sampling]
RATE[Rate-Limited Sampling]
ADAPTIVE[Adaptive Sampling]
end
subgraph "Server-Side"
INGEST[Ingestion Sampling]
end
APP[Application] --> FIXED
APP --> RATE
APP --> ADAPTIVE
FIXED --> ENDPOINT[Ingestion Endpoint]
RATE --> ENDPOINT
ADAPTIVE --> ENDPOINT
ENDPOINT --> INGEST
INGEST --> LA[Log Analytics]
style FIXED fill:#c8e6c9
style RATE fill:#c8e6c9
style INGEST fill:#ffcdd2
Sampling Configuration
Fixed-Rate Sampling (OpenTelemetry)
// ASP.NET Core - 10% sampling
builder . Services . AddOpenTelemetry (). UseAzureMonitor ( options =>
{
options . SamplingRatio = 0.1F ;
});
Rate-Limited Sampling
// ASP.NET Core - ~5 traces per second
builder . Services . AddOpenTelemetry (). UseAzureMonitor ( options =>
{
options . TracesPerSecond = 5.0 ;
});
Java Sampling Overrides
{
"preview" : {
"sampling" : {
"overrides" : [
{
"telemetryType" : "request" ,
"attributes" : [
{
"key" : "http.url" ,
"value" : "https?://[^/]+/health.*" ,
"matchType" : "regexp"
}
],
"percentage" : 0
}
]
}
}
}
Sampling Decision Matrix
Scenario
Recommended Sampling
Configuration
Development/Testing
None or 100%
SamplingRatio = 1.0
Low-volume production
None or minimal
SamplingRatio = 0.5 to 1.0
High-volume production
Rate-limited
TracesPerSecond = 5.0 (Java default)
Cost-sensitive
Aggressive
SamplingRatio = 0.01 to 0.1
Health checks
Exclude
Sampling override with 0%
Important : Sampling is not enabled by default in .NET, Node.js, and Python OpenTelemetry distros. You must explicitly configure sampling. Java agent 3.4.0+ enables rate-limited sampling (5 req/sec) by default.
Best Practices for Sampling
Never use ingestion sampling as primary strategy - Data is already transmitted before being dropped
Configure sampling at the SDK level - More efficient and preserves trace integrity
Use sampling overrides for health checks - Exclude noisy endpoints
Test sampling configurations - Validate that critical transactions are captured
Monitor for broken traces - Ensure all services use consistent sampling
Distributed Tracing
How Distributed Tracing Works
sequenceDiagram
participant Client
participant Frontend
participant OrderAPI
participant PaymentAPI
participant Database
Client->>Frontend: HTTP Request
Note over Frontend: Generate trace-id: abc123
Frontend->>OrderAPI: POST /orders (trace-id: abc123)
OrderAPI->>PaymentAPI: POST /payments (trace-id: abc123)
PaymentAPI->>Database: INSERT payment
Database-->>PaymentAPI: Success
PaymentAPI-->>OrderAPI: Payment confirmed
OrderAPI-->>Frontend: Order created
Frontend-->>Client: 201 Created
Note over Client,Database: All spans share trace-id for correlation
Context Propagation
Application Insights supports W3C Trace Context standard for cross-service correlation.
Header
Purpose
traceparent
Contains trace-id, parent-id, and flags
tracestate
Vendor-specific trace information
Application Map
The Application Map provides visual representation of your distributed system topology.
graph TB
subgraph "Application Map View"
WEB[Web App<br/>avg: 245ms<br/>errors: 0.1%]
API[API Service<br/>avg: 89ms<br/>errors: 0.05%]
SQL[(SQL Database<br/>avg: 12ms)]
REDIS[(Redis Cache<br/>avg: 2ms)]
EXTERNAL[External API<br/>avg: 340ms<br/>errors: 2.1%]
end
WEB -->|1.2k req/min| API
API -->|3.4k calls/min| SQL
API -->|8.1k calls/min| REDIS
API -->|450 calls/min| EXTERNAL
style WEB fill:#c8e6c9
style API fill:#c8e6c9
style SQL fill:#e3f2fd
style REDIS fill:#e3f2fd
style EXTERNAL fill:#fff3e0
Transaction Diagnostics
Use transaction diagnostics to trace individual requests end-to-end:
Navigate to Failures or Performance view
Select a specific operation
Click on a sample request
View the end-to-end transaction timeline
Alerting and Smart Detection
Alert Types
Alert Type
Use Case
Configuration
Metric Alerts
Threshold-based monitoring
Define conditions on metrics
Log Alerts
Complex query-based alerts
KQL queries on log data
Smart Detection
Anomaly detection
Auto-configured, ML-based
Availability Alerts
Endpoint health
Synthetic test failures
Smart Detection Capabilities
Smart Detection uses machine learning to automatically detect:
Detection Type
Description
Failure Anomalies
Abnormal rise in failed request rate
Performance Anomalies
Response time degradation
Trace Degradation
Increase in error/warning log ratio
Memory Leak
Potential memory leak patterns
Exception Volume
Abnormal rise in exceptions
Security Anti-patterns
Potential security issues
Configuring Alerts
Metric Alert Example (ARM/Bicep)
resource metricAlert 'Microsoft.Insights/metricAlerts@2018-03-01' = {
name: 'High-Failure-Rate-Alert'
location: 'global'
properties: {
severity: 2
enabled: true
scopes: [appInsights.id]
evaluationFrequency: 'PT5M'
windowSize: 'PT15M'
criteria: {
'odata.type': 'Microsoft.Azure.Monitor.SingleResourceMultipleMetricCriteria'
allOf: [
{
name: 'FailedRequests'
metricName: 'requests/failed'
operator: 'GreaterThan'
threshold: 10
timeAggregation: 'Count'
}
]
}
actions: [
{
actionGroupId: actionGroup.id
}
]
}
}
Log Alert Example (KQL)
// Alert on high error rate
requests
| where timestamp > ago ( 15 m )
| summarize
TotalRequests = count (),
FailedRequests = countif ( success == false )
| extend FailureRate = ( FailedRequests * 100.0 ) / TotalRequests
| where FailureRate > 5
Availability Tests
Test Type
Description
Use Case
URL Ping Test
Simple HTTP GET
Basic availability check
Standard Test
HTTP request with assertions
Response validation
Custom TrackAvailability
Code-based tests
Complex scenarios
// Custom availability test
using var client = new TelemetryClient ();
var availability = new AvailabilityTelemetry
{
Name = "Custom Health Check" ,
RunLocation = "Azure Function" ,
Success = true ,
Duration = TimeSpan . FromMilliseconds ( 150 )
};
client . TrackAvailability ( availability );
.NET Profiler
The .NET Profiler captures detailed performance traces for your application.
Feature
Description
Hot Path Analysis
Identifies CPU-intensive code paths
Memory Allocation
Tracks object allocations
Exception Profiling
Captures exception call stacks
Async Analysis
Visualizes async execution patterns
Enabling .NET Profiler
Azure App Service : Enable via Application Insights blade
VMs/VMSS : Install Diagnostic Services extension
Code-based : Configure in application startup
// Profiler settings configuration
builder . Services . AddApplicationInsightsTelemetry ();
builder . Services . AddServiceProfiler ( options =>
{
options . IsProfilingEnabled = true ;
options . Duration = TimeSpan . FromMinutes ( 2 );
});
Snapshot Debugger
Automatically captures debug snapshots when exceptions occur.
Scenario
Captured Data
Unhandled Exceptions
Full stack, local variables
First-chance Exceptions
Configurable capture
Throttled
Limited to prevent overhead
Enabling Snapshot Debugger
// ASP.NET Core
builder . Services . AddApplicationInsightsTelemetry ();
builder . Services . AddSnapshotCollector ( config =>
{
config . IsEnabled = true ;
config . SnapshotsPerTenMinutesLimit = 1 ;
config . MaximumSnapshotsRequired = 3 ;
});
flowchart TB
START[Performance Issue Reported]
PERF[Open Performance View]
IDENTIFY[Identify Slow Operation]
DRILL[Drill into Samples]
PROFILE[View Profiler Traces]
DEPS[Analyze Dependencies]
FIX[Implement Fix]
VERIFY[Verify Improvement]
START --> PERF
PERF --> IDENTIFY
IDENTIFY --> DRILL
DRILL --> PROFILE
DRILL --> DEPS
PROFILE --> FIX
DEPS --> FIX
FIX --> VERIFY
VERIFY --> |Issue Persists| PERF
VERIFY --> |Resolved| END[Done]
Cost Optimization
Cost Drivers
Factor
Impact
Optimization Strategy
Data Ingestion Volume
Primary cost driver
Sampling, filtering
Data Retention
Storage costs
Reduce retention, archive
Custom Metrics
Stored in both logs and metrics
Use preaggregated metrics
Query Volume
Compute costs
Optimize queries, use caching
Cost Management Strategies
// Reduce data volume to 10%
builder . Services . AddOpenTelemetry (). UseAzureMonitor ( options =>
{
options . SamplingRatio = 0.1F ;
});
2. Set Daily Cap
Important : For workspace-based Application Insights, you must configure daily caps on both the Application Insights resource and the Log Analytics workspace. The effective cap is the minimum of the two settings.
Configure via Azure Portal:
1. Navigate to Application Insights → Usage and estimated costs → Daily cap
2. Navigate to Log Analytics workspace → Usage and estimated costs → Daily cap
// Log Analytics workspace with daily cap
resource logAnalyticsWorkspace 'Microsoft.OperationalInsights/workspaces@2022-10-01' = {
name: 'my-log-analytics'
location: location
properties: {
retentionInDays: 30
workspaceCapping: {
dailyQuotaGb: 5 // Daily cap in GB
}
}
}
resource appInsights 'Microsoft.Insights/components@2020-02-02' = {
name: 'my-app-insights'
location: location
kind: 'web'
properties: {
Application_Type: 'web'
WorkspaceResourceId: logAnalyticsWorkspace.id
RetentionInDays: 30
}
}
Warning : Use daily caps as a safety net, not a replacement for sampling. Hitting the cap causes data loss until the next day.
3. Filter Noisy Telemetry
// Filter out health check requests
builder . Services . AddApplicationInsightsTelemetry ();
builder . Services . AddApplicationInsightsTelemetryProcessor < HealthCheckFilter > ();
public class HealthCheckFilter : ITelemetryProcessor
{
private ITelemetryProcessor Next { get ; }
public HealthCheckFilter ( ITelemetryProcessor next )
{
Next = next ;
}
public void Process ( ITelemetry item )
{
if ( item is RequestTelemetry request )
{
if ( request . Url ?. AbsolutePath . Contains ( "/health" ) == true )
{
return ; // Don't send health check telemetry
}
}
Next . Process ( item );
}
}
4. Use Basic Logs Plan
For high-volume, infrequently-queried tables, switch to Basic Logs plan:
Plan
Ingestion Cost
Query Cost
Retention
Analytics
Standard
Included
30-730 days
Basic
~67% less
Per query
8 days
Cost Monitoring Query
// Analyze data ingestion by table
union withsource = TableName *
| where TimeGenerated > ago ( 30 d )
| summarize
RecordCount = count (),
DataSizeGB = sum ( estimate_data_size (*)) / 1024 / 1024 / 1024
by TableName
| order by DataSizeGB desc
Security and Compliance
Security Best Practices
Practice
Implementation
Use Managed Identity
Authenticate without credentials
Connection Strings over iKey
More secure, supports regional endpoints
Private Link
Keep traffic on Microsoft backbone
Data Anonymization
Don't collect PII in telemetry
Customer-Managed Keys
Encrypt data with your own keys
Network Security
graph LR
subgraph "Your VNet"
APP[Application]
PE[Private Endpoint]
end
subgraph "Azure Backbone"
AMPLS[Azure Monitor Private Link Scope]
AI[Application Insights]
LA[Log Analytics]
end
APP --> PE
PE --> AMPLS
AMPLS --> AI
AMPLS --> LA
style PE fill:#c8e6c9
style AMPLS fill:#c8e6c9
Configuring Private Link
Create Azure Monitor Private Link Scope (AMPLS)
Add Application Insights and Log Analytics resources to AMPLS
Create Private Endpoint in your VNet
Configure DNS resolution
Data Privacy
// Disable IP collection (default in recent SDKs)
builder . Services . AddApplicationInsightsTelemetry ( options =>
{
options . EnableAdaptiveSampling = true ;
});
// Use telemetry initializer to remove sensitive data
builder . Services . AddSingleton < ITelemetryInitializer , PrivacyTelemetryInitializer > ();
public class PrivacyTelemetryInitializer : ITelemetryInitializer
{
public void Initialize ( ITelemetry telemetry )
{
// Remove or hash sensitive properties
if ( telemetry is ISupportProperties propTelemetry )
{
propTelemetry . Properties . Remove ( "user_email" );
}
}
}
Well-Architected Framework Alignment
Reliability
Recommendation
Benefit
One App Insights per workload per environment
Prevents telemetry mixing; isolated failure domains
Same region as Log Analytics
Reduces cross-region failure risk
Resilient workspace design
Continuous monitoring during failures
Infrastructure as Code
Quick recovery of dashboards, alerts, queries
Security
Recommendation
Benefit
Use managed identities
No credential management
Implement Private Link
Network isolation
Enable customer-managed keys
Control over encryption
Don't store PII
Compliance with GDPR, etc.
Cost Optimization
Recommendation
Benefit
Configure appropriate sampling
Reduced data volume
Set daily caps
Prevent cost overruns
Use Basic Logs for high-volume tables
Lower ingestion cost
Disable unnecessary collection modules
Eliminate waste
Operational Excellence
Recommendation
Benefit
Keep SDKs up to date
Security patches, bug fixes
Use autoinstrumentation when possible
Reduced maintenance
Implement availability tests
Proactive monitoring
Configure meaningful alerts
Actionable notifications
Recommendation
Benefit
Deploy in same region as workload
Reduced latency
Configure appropriate profiling frequency
Minimize overhead
Use preaggregated metrics
Efficient querying
Production Readiness Checklist
Pre-Launch Checklist
Infrastructure Setup
[ ] Application Insights resource created (workspace-based)
[ ] Log Analytics workspace configured in same region
[ ] Connection string stored securely (Key Vault or environment variable)
[ ] Private Link configured (if required)
[ ] Daily cap configured appropriately
Instrumentation
[ ] OpenTelemetry or SDK integrated correctly
[ ] Cloud role name configured for each service
[ ] Connection string validated
[ ] Test telemetry flowing to Application Insights
Sampling & Data Management
[ ] Sampling strategy defined and configured
[ ] Health check endpoints excluded from telemetry
[ ] Data retention policy configured
[ ] Cost alerts configured
Alerting
[ ] Availability tests configured
[ ] Metric alerts for key SLIs (error rate, latency)
[ ] Smart Detection reviewed and configured
[ ] Action groups configured with appropriate notifications
Distributed Tracing
[ ] All services instrumented
[ ] Cross-service correlation validated
[ ] Application Map shows correct topology
[ ] Transaction search returns correlated traces
Security
[ ] No sensitive data in telemetry
[ ] IP collection disabled (if required)
[ ] RBAC configured for Application Insights access
[ ] Diagnostic settings enabled for audit logs
Operational Readiness
[ ] Dashboards created for key metrics
[ ] Runbooks documented for common issues
[ ] On-call team trained on Application Insights
[ ] Workbooks created for incident investigation
Post-Launch Validation
[ ] Verify telemetry volume is within expectations
[ ] Confirm sampling is working correctly
[ ] Validate alerts fire correctly
[ ] Test incident investigation workflow
[ ] Review cost after first billing cycle
Ongoing Maintenance
Task
Frequency
Review and update SDK versions
Quarterly
Analyze cost trends
Monthly
Review Smart Detection findings
Weekly
Update dashboards and workbooks
As needed
Test availability test alerts
Monthly
Review data retention settings
Quarterly
References
Official Microsoft Documentation
GitHub Resources