AZ-204 · Azure Developer Associate
Azure Application Insights & App Performance Monitoring
A complete one-note covering APM fundamentals, instrumentation strategies, metrics, availability testing, and Application Map — with analogies, definitions, and real-world use cases.
Mental Map — Where does this fit?
Big Picture Analogy
Azure Monitor is the hospital. Application Insights is the specialist doctor assigned to your apps — wiring up heart-rate monitors (metrics), running blood tests (logs), watching vital signs in real time (Live Metrics), and alerting nurses when something goes wrong (Smart Detection). Without this doctor, you only know the patient is sick after they collapse.
1 · What is Application Insights?
Core Definition
Application Insights is an extension of Azure Monitor that provides Application Performance Monitoring (APM). It collects metrics, telemetry, and trace logs from your live applications and visualises them so you can proactively detect issues before users notice, and reactively diagnose root causes when incidents happen.
APM
Application Performance Monitoring — the discipline of tracking an app's health, speed, and error rates continuously across its full lifecycle (dev → test → production).
Telemetry
Automated data collected from a running system. Includes request counts, response times, exceptions, page views, dependency calls, and custom events written by developers.
Trace Logs
Diagnostic log messages emitted at runtime (e.g.,
log.Information("Order placed")). App Insights correlates these with the specific HTTP request that produced them.Analogy — The Flight Black Box
Your app is an aeroplane. Without Application Insights it has no flight recorder. If it crashes you have no idea why. App Insights is the Flight Data Recorder + Cockpit Voice Recorder — continuously capturing altitude (response time), engine metrics (CPU/memory), co-pilot chatter (trace logs), and GPS track (distributed traces).
| Feature | What it does | Analogy |
|---|---|---|
| Live Metrics | Real-time stream of requests, failures, and server health | Ambulance dashboard while driving |
| Availability Tests | Probes your URL from global Azure datacentres on a schedule | Security guard doing hourly rounds |
| Smart Detection | ML-based anomaly detection — alerts on sudden failure spikes | Smoke detector in the kitchen |
| Application Map | Visual topology of microservices + health overlay | Circuit board with red LEDs on failures |
| Distributed Tracing | End-to-end journey of a single request through N services | Parcel tracking from warehouse to door |
| Usage Analytics | Funnels, retention, user flows | CCTV heat-map in a retail store |
| GitHub / ADO | Create work items directly from an exception | Auto-filing bug report to Jira |
What App Insights monitors — at a glance
Request rates & response times
Failure rates
Dependency calls (SQL, HTTP)
Exceptions & stack traces
Page views & AJAX
User & session counts
Performance counters (CPU/RAM)
Docker / Azure host diagnostics
Custom events & metrics
Distributed traces
Real-World Use Cases
E-Commerce — Flipkart / Amazon scenario
Detecting checkout failures during a flash sale
During a sale, thousands of concurrent users hit checkout. Smart Detection fires an alert within 2 minutes when the payment-service failure rate jumps from 0.5% → 12%. The on-call engineer opens Application Map, sees the payment-service node glowing red, clicks through to Distributed Traces, and finds a SQL connection-pool timeout. Fix deployed before most users notice.
SaaS Platform — Salesforce / Dynamics scenario
Identifying a slow dependency degrading UX globally
Average response time creeps from 200ms → 1.4s over three weeks. Log-based metrics show the "ReportGenerator" external API is 5× slower. Usage analytics reveal 22% of users abandoned the report page. PM and dev team correlate performance data with user drop-off — making a business case to replace the third-party API.
Banking / FinTech — HDFC / Revolut scenario
Availability SLA monitoring across geographies
Availability Tests ping the login endpoint every 5 minutes from 5 Azure regions. One morning, the India-East probe fails while US and Europe pass. Alert fires, geo-routing issue found in Azure Front Door config. Without global probes, the team would have relied on customer complaints — damaging trust in a regulated industry.
2 · Log-Based vs Standard Metrics
Analogy — CCTV vs Sensor Dashboard
Log-based metrics = full CCTV footage. Every event is stored; you can rewind and inspect any moment in detail. But replaying hours of footage at query time is slow. Standard (pre-aggregated) metrics = the live sensor dashboard on the security desk — it shows "42 people entered in the last minute" instantly because it was computed as it happened. You can't drill into individuals, but it's blazing fast for alerts and dashboards.
📼 Log-Based Metrics
StorageIndividual events in Log Analytics
Query timeKusto query at read time (slower)
DetailAll event properties available
Best forAd-hoc diagnostics, root-cause
SamplingAccuracy drops if sampling on
SDKAny version
⚡ Standard (Pre-aggregated)
StorageTime-series aggregates
Query timePre-computed — very fast
DetailOnly key dimensions kept
Best forDashboards, real-time alerts
SamplingUnaffected — aggregated before
SDKSDK 2.7+ for preaggregation
Key insight: Both metric types coexist in App Insights. In Metrics Explorer, use the namespace selector to switch between "Log-based metrics" and "Standard metrics (preview)". Use Standard for operations dashboards; switch to Log-based when investigating a specific incident.
3 · Instrumenting Your App
Instrumentation
Enabling an application to capture and emit telemetry data. Think of it as "wiring up sensors inside the app." More sensors = more visibility, but also more effort to configure.
Zero code changes
Auto-Instrumentation
Enable via Azure Portal on App Service, AKS, VMs — no code changes needed.
Best for: Already-deployed apps, migrations, teams without SDK expertise.
Limitation: Less configurable; not available for all languages. Check docs before assuming coverage.
Best for: Already-deployed apps, migrations, teams without SDK expertise.
Limitation: Less configurable; not available for all languages. Check docs before assuming coverage.
SDK in code
Manual Instrumentation
Install App Insights SDK or Azure Monitor OpenTelemetry Distro in your project.
Best for: Custom events, business metrics, full telemetry pipeline control.
Requirement: You manage SDK version updates yourself.
Best for: Custom events, business metrics, full telemetry pipeline control.
Requirement: You manage SDK version updates yourself.
Use the SDK only when…
① Custom events / metrics — e.g., "items sold", "loan approved", "game won"
② Control telemetry flow — filter, enrich, or sample before sending to Azure
③ Autoinstrumentation unavailable — language or platform not supported
② Control telemetry flow — filter, enrich, or sample before sending to Azure
③ Autoinstrumentation unavailable — language or platform not supported
What is OpenTelemetry?
OpenTelemetry (OTel) is the open-source, vendor-neutral standard for capturing traces, metrics, and logs — backed by CNCF. Microsoft is a Platinum Member. Think of it as the USB standard for observability: instead of every cloud vendor having a proprietary cable (SDK), OTel gives you one universal connector that works anywhere — AWS, GCP, Azure, or on-prem.
App Insights → OpenTelemetry terminology map
| Old (Application Insights) | New (OpenTelemetry) |
|---|---|
| Autocollectors | Instrumentation libraries |
| Channel | Exporter |
| Codeless / Agent-based | Autoinstrumentation |
| Traces | Logs |
| Requests | Server Spans |
| Dependencies | Client / Internal Spans |
| Operation ID | Trace ID |
| Operation Parent ID | Span ID |
4 · Availability Tests
Definition
Availability Tests (aka Synthetic Transaction Monitoring) are scheduled HTTP probes fired from Azure datacentres across the world to verify your app is up, responsive, and returning correct results. No code changes required.
Analogy — Mystery Shoppers at Every Branch
Imagine your app is a bank with branches worldwide. Azure sends "mystery shoppers" (probes) every 5 minutes to every branch, checking: Is the door open? Is service quick? Is the ATM working? If a branch fails 3 checks in a row, the regional manager (you) gets an alert.
✅ Standard Test
Standard Availability Test
Single HTTP request. Validates TLS/SSL cert, cert expiry, custom headers, GET/POST/HEAD verbs. Recommended for most scenarios.
🛠 Custom
Custom TrackAvailability()
Your own code runs the test logic and calls
TrackAvailability(). Use for multi-step, login-required, or non-HTTP tests.⚠ Retiring Sep 2026
URL Ping Test (Classic)
Simple URL check via portal. Being retired Sep 30, 2026. Migrate to Standard Tests before the deadline.
Retirement warning: URL Ping Tests retire September 30, 2026. Migrate existing tests to Standard Tests before then to continue running single-step availability tests in your App Insights resources.
Healthcare SaaS — Epic / MedGenix scenario
Detecting regional outages before patients can't book appointments
Availability tests ping the appointment booking endpoint from East US, West Europe, and Southeast Asia every 5 minutes. When a CDN misconfiguration affects Southeast Asia traffic, the Singapore probe fails while US and Europe pass. The infrastructure team receives an alert and rolls back the CDN config within 8 minutes — before any patient calls the helpdesk.
5 · Application Map
Definition
Application Map is a visual, auto-discovered topology of your distributed application. Every microservice and external dependency becomes a node. Node colour = health status. Click any node to drill into its performance, failures, and traces.
Analogy — Google Maps for Your Microservices
Google Maps shows every city (service), road (API call), and roadblock (failure). You can instantly see which junction is causing the traffic jam — rather than reading through thousands of log lines. Application Map is the Google Maps for your microservices. Red nodes = jams. Click to get the Waze-style drill-down of why it's slow.
Healthy service
Failing / hot-spot
Internal dependency (SQL, Storage)
External dependency (3rd party API)
Key concepts in Application Map
Component — any independently deployable microservice with App Insights SDK installed.
Dependency — external services (SQL, Redis, Event Hub, REST API) your component calls. Observed but not instrumented by your team.
Cloud Role Name — property App Insights uses to label each node on the map. Can be overridden in code.
Progressive Discovery — the map builds itself by following HTTP dependency calls. Hit "Update map components" to refresh.
Dependency — external services (SQL, Redis, Event Hub, REST API) your component calls. Observed but not instrumented by your team.
Cloud Role Name — property App Insights uses to label each node on the map. Can be overridden in code.
Progressive Discovery — the map builds itself by following HTTP dependency calls. Hit "Update map components" to refresh.
Ride-sharing — Uber / OLA scenario
Pinpointing the broken link in a 15-service chain
A rider reports "app stuck on booking." With 15 microservices, reading logs would take hours. Application Map shows the Pricing Service in amber (elevated latency) while downstream Surge-Calculator is red (failures). Drill-down reveals a Redis cache eviction storm. Team fixes cache TTL config in 20 minutes — without App Map it would have been a multi-hour bridge call.
6 · Getting Started — Decision Flow
Path A
At Runtime
No code change. Enable in portal. Best for live apps.
Path B
At Dev Time
Add SDK. Custom events + full telemetry control.
Path C
Browser / SPA
JS snippet for page views, AJAX & user flows.
Path D
Availability
Add Standard Tests from Azure Portal.
Glossary of Key Terms
Instrumentation Key / Connection StringUnique token linking your app to an App Insights resource
SamplingReduces telemetry volume — stores only a % of events
Kusto (KQL)Query language for log-based metrics in Log Analytics
Span / Trace IDOTel IDs — trace = full request journey, span = one step
PreaggregationComputing metric summaries at collection time, not query time
Smart DetectionAutomated ML anomaly alerts — no manual rule setup needed
Live Metrics StreamSub-second real-time dashboard; zero storage impact on host
Cloud Role NameProperty used to label nodes on Application Map
CNCFCloud Native Computing Foundation — governs OTel, Kubernetes & more
OTel DistroMicrosoft's pre-packaged OTel SDK + Azure Monitor exporter bundle
AZ-204 Exam Tips
✅ Remember
App Insights is an extension of Azure Monitor — not a standalone service
Availability tests require no code changes — any HTTP/HTTPS endpoint works
URL Ping Tests retire Sep 30, 2026 — migrate to Standard Tests
Both Log-based & Standard metrics coexist — switch via namespace selector
SDK 2.7+ enables client-side preaggregation → less cost + better accuracy
Smart Detection needs no manual alert rules — it is fully automatic
App Map uses Cloud Role Name property to label each node
Live Metrics has zero storage impact on the host environment
⚠ Trick Questions
Standard metrics are NOT affected by sampling — log-based are
App Map discovers topology via HTTP dependency tracking, not network scans
Custom TrackAvailability() = you own test logic; Standard Test = Azure owns it
OTel "Traces" = App Insights "Logs" — confusing terminology inversion!
Autoinstrumentation may not support all languages — always check docs
Log-based metrics use Kusto queries AT READ TIME — slower than Standard
Both metric types visible in the SAME Metrics Explorer — switch via namespace
SDK not required if autoinstrumentation is available and no custom events needed
Quick Decision Guide
| If you need… | Use this |
|---|---|
| Monitor deployed app with no code change | Autoinstrumentation via Azure Portal |
| Custom business events (orders, logins, etc.) | Manual SDK or OTel Distro |
| Near-real-time dashboards, low query cost | Standard (pre-aggregated) metrics |
| Drill into a specific user session or exception | Log-based metrics + KQL query |
| Check public endpoint availability globally | Standard Availability Tests |
| Test a multi-step login flow for availability | Custom TrackAvailability() |
| See which of 20 microservices is failing | Application Map — look for red/amber nodes |
| Trace a single slow request end-to-end | Distributed Tracing — search by Trace ID |
| Alerts without writing alert rules manually | Enable Smart Detection |