Azure Architecture Handbook

Section 1

Governance — Organizing the Cloud Before You Build

Imagine a big housing society. Before anyone builds a single flat, someone decides the rules: who owns which plot, how bills are split, what colors are allowed, who can enter which building. Azure governance is exactly that — the rules and structure that keep hundreds of cloud resources from becoming chaos.

1.1 Governance (The Big Picture) ▾

🪔 Imagine this

The management committee of a housing society — they don't build flats, they make sure everything built follows the rules.

What is it?

Governance is the set of structures (management groups, subscriptions, resource groups) and rules (policies, RBAC, tags) that control HOW your organization uses Azure.

Why it exists

Without governance, every team creates resources anywhere, bills become untraceable, security gaps appear, and nobody knows who owns what. Governance prevents the chaos before it starts.

When to use

From day one. Governance designed after 500 resources exist is 10x harder than governance designed before the first resource.

Architect's lens

As an architect, governance is usually your FIRST design decision in any AZ-305 scenario — the hierarchy you choose decides how policies, billing, and access flow down to everything else.

1.2 Management Groups ▾

🪔 Imagine this

The head office of a company with many branches. A rule made at head office automatically applies to every branch below it.

What is it?

Containers that sit ABOVE subscriptions. You group multiple subscriptions under a management group and apply policies/access at that level — everything below inherits automatically.

Why it exists

Large organizations have many subscriptions (Finance, HR, Dev, Prod). Applying the same security policy to each one manually is error-prone. Set it once at the management group, and all subscriptions inherit it.

When to use

When you have more than a handful of subscriptions, or need to enforce company-wide rules (e.g., 'no resources outside India region') across all of them.

Architect's lens

You can nest up to 6 levels deep. A typical design: Root → Departments → Environment (Prod/Non-Prod) → Subscriptions. Policy assigned at top flows to the bottom.

1.3 Azure Subscriptions ▾

🪔 Imagine this

Separate electricity meters for each flat in a building. Each meter gets its own bill, its own limit, and tripping one doesn't affect the others.

What is it?

A subscription is a billing boundary and an administrative boundary. Every resource you create lives inside exactly one subscription, and that subscription gets the bill.

Why it exists

Organizations need to separate billing (whose budget pays?), limits (Azure quotas apply per subscription), and blast radius (a compromised Dev subscription shouldn't touch Production).

When to use

Create separate subscriptions to isolate: environments (Dev/Test/Prod), departments, or large projects with their own budgets.

Architect's lens

Architect's rule of thumb: subscriptions are SECURITY and BILLING boundaries. If two workloads need totally different access rules or separate invoices — separate subscriptions.

1.4 Resource Groups ▾

🪔 Imagine this

When you move house, you pack one carton per room — 'Kitchen', 'Bedroom'. Everything for one purpose travels together and can be unpacked (or thrown away) together.

What is it?

A logical folder inside a subscription that holds related resources — typically everything belonging to one application or workload (its web app, database, storage).

Why it exists

Resources that live and die together should be managed together. Delete the resource group → everything inside is deleted in one shot. Apply access on the group → applies to all resources in it.

When to use

One resource group per application per environment is the common pattern (e.g., rg-shop-prod, rg-shop-dev).

Architect's lens

Golden rule: resources sharing the SAME LIFECYCLE go in the same group. A resource group can hold resources from different regions, but the group itself has a 'home' region (stores metadata only).

1.5 Resource Tagging ▾

🪔 Imagine this

Sticky labels on office equipment: 'Dept: Finance', 'Owner: Priya', 'Project: Diwali-Sale'. The label doesn't change the item — it tells you everything about it at a glance.

What is it?

Tags are key-value labels you attach to resources, resource groups, or subscriptions — e.g., CostCenter=Marketing, Environment=Prod, Owner=raushan@company.com.

Why it exists

When the monthly Azure bill arrives, finance asks: 'Which department spent this ₹4 lakh?' Without tags, nobody knows. With tags, you slice the bill by department, project, or owner in one click.

When to use

Always — and enforce it with Azure Policy ('every resource MUST have a CostCenter tag') so people can't skip labeling.

Architect's lens

Standard tag set every architect designs: Environment, CostCenter, Owner, Application, Criticality. Tags power cost reports, automation scripts, and cleanup of orphaned resources.

1.6 Azure Policy vs RBAC ▾

🪔 Imagine this

Building bylaws vs door access cards. Bylaws say WHAT can be built ('no building above 4 floors'). Access cards decide WHO can enter which room. Two different controls, both needed.

What is it?

Azure Policy controls WHAT can exist and how it must be configured ('only allow VMs in Central India', 'storage must use encryption'). RBAC controls WHO can perform actions ('Priya can read, Amit can manage').

Why it exists

RBAC alone isn't enough — an authorized person can still create a non-compliant resource. Policy alone isn't enough — compliant resources still need access control. Together they cover both axes.

When to use

Policy: enforcing standards, compliance, allowed regions/SKUs, mandatory tags. RBAC: granting teams the minimum access needed for their role.

Architect's lens

Policy effects to know: Deny (block creation), Audit (allow but flag), Append/Modify (auto-fix), DeployIfNotExists (auto-deploy missing pieces). Group policies into Initiatives for compliance standards like ISO 27001.

1.7 Azure Landing Zones ▾

🪔 Imagine this

A ready-to-move-in apartment: wiring done, plumbing done, security installed, society rules registered. You just bring your furniture (the workload) and start living.

What is it?

A pre-designed, pre-deployed Azure environment with governance, networking, identity, and security ALREADY configured — following Microsoft's Cloud Adoption Framework best practices.

Why it exists

Most enterprises rebuild the same foundation (hub network, policies, monitoring, identity) for every project — inconsistently. Landing Zones make the foundation repeatable and correct from the start.

When to use

Any organization adopting Azure seriously, especially at enterprise scale or before migrating major workloads from on-premises.

Architect's lens

A landing zone typically includes: management group hierarchy, platform subscriptions (connectivity, identity, management), hub-spoke networking, baseline policies, and logging — deployed via templates (Bicep/Terraform) so it's identical every time.

Section 2

Compute — Choosing WHERE Your Code Runs

All compute services answer one question: 'I have code — where should it run?' The answer depends on how much control you want vs how much management you're willing to do. Think of it as housing options: own a house (VM), rent a serviced apartment (App Service), or pay per night in a hotel (Functions).

2.1 Azure Virtual Machines (VMs) ▾

🪔 Imagine this

Renting an empty flat. The building (hardware) belongs to the landlord, but inside — furniture, cleaning, repairs, security — everything is YOUR responsibility.

What is it?

A full computer in the cloud — you choose the OS (Windows/Linux), size (CPU/RAM), and disks. You get complete control of everything from the OS upward. This is IaaS (Infrastructure as a Service).

Why it exists

Some applications simply can't run on managed platforms — legacy software, custom OS configurations, special licensing, or anything needing admin/root access.

When to use

Lift-and-shift migrations from on-premises, legacy apps, software requiring specific OS setups, or workloads where you need full control.

Architect's lens

Architect decisions: VM size family (B-series burstable for dev, D-series general, E-series memory-heavy), disk type (Premium SSD for prod), Availability Zones for uptime, auto-shutdown for dev/test cost savings.

2.2 Azure Batch ▾

🪔 Imagine this

Hiring 1,000 temporary workers for one massive day of work — they arrive, finish the job in parallel, and leave. You don't keep them on payroll.

What is it?

A service for running large-scale parallel jobs — it automatically creates a pool of VMs, splits your job across them, runs everything, and tears the VMs down when done.

Why it exists

Some jobs (rendering a film, risk simulations, scientific modeling, processing 1 million images) would take weeks on one machine but hours on 500. Managing 500 VMs manually is impossible — Batch automates it.

When to use

High-Performance Computing (HPC): media rendering, financial risk modeling, genetic research, large-scale image/data processing — anything 'embarrassingly parallel'.

Architect's lens

You define a pool (VM size + count), jobs, and tasks. Batch handles scheduling, retries, and scaling. Use low-priority/spot VMs in the pool for up to 80% cost savings on interruptible work.

2.3 Azure App Service ▾

🪔 Imagine this

A serviced apartment. You bring your luggage (code); cleaning, maintenance, security, and electricity are all handled by the building management.

What is it?

A fully managed platform (PaaS) for hosting web applications and REST APIs. Microsoft manages the OS, patching, and infrastructure — you just deploy code.

Why it exists

Most teams want to ship features, not babysit servers. App Service gives auto-scaling, custom domains, SSL, CI/CD integration, and staging slots without any infrastructure work.

When to use

Web apps, REST APIs, company portals — the default choice for most web workloads, especially with small teams.

Architect's lens

Key designs: App Service Plan tier (Basic/Standard/Premium decides features + price), deployment slots (test in staging, swap to production with zero downtime), autoscale rules (scale out on CPU or schedule), VNet integration for private connectivity.

2.4 Azure Container Instances (ACI) ▾

🪔 Imagine this

A pop-up food stall. No restaurant lease, no staff hiring — set up the stall in minutes, serve, pack up. Perfect for short, simple gigs.

What is it?

The fastest, simplest way to run a container in Azure — no cluster, no orchestrator, no VMs to manage. Give it a container image, it runs in seconds, billed per second.

Why it exists

Sometimes you just need ONE container running NOW — a quick task, a build job, a burst worker. Setting up Kubernetes for that is like building a food court for one stall.

When to use

Short-lived tasks, dev/test containers, burst capacity for AKS (virtual nodes), simple background processors that don't need orchestration.

Architect's lens

Know its limits for the exam: no auto-scaling, no load balancing across instances, no rolling deployments. The moment you need those → you've outgrown ACI, move to AKS or Container Apps.

2.5 Azure Kubernetes Service (AKS) ▾

🪔 Imagine this

A food court manager. Dozens of stalls (containers) operate at once — the manager assigns spots, replaces a stall that shuts down, adds counters when the crowd grows, and balances customers across them.

What is it?

Managed Kubernetes — the industry-standard orchestrator for running many containers: scheduling, self-healing, scaling, rolling updates. Azure manages the control plane for free; you pay for worker nodes.

Why it exists

Microservices architectures run dozens or hundreds of containers. Someone must restart crashed ones, distribute traffic, roll out updates without downtime. That 'someone' is Kubernetes.

When to use

Microservices at scale, teams with DevOps maturity, workloads needing fine-grained control over deployment, networking, and scaling.

Architect's lens

Architect decisions: node pools (separate pools for system vs workload, GPU pools for ML), cluster autoscaler + horizontal pod autoscaler, ingress controller for routing, Azure CNI vs kubenet networking, integration with ACR (container registry) and Entra ID.

2.6 Azure Functions ▾

🪔 Imagine this

A taxi vs owning a car. You don't pay for the car sitting idle in the garage — you pay only for the rides you actually take.

What is it?

Serverless compute — small pieces of code that run in response to a trigger (HTTP request, timer, new queue message, new blob uploaded). No servers to manage; scales from zero to thousands automatically.

Why it exists

Many workloads are bursty: nothing for hours, then 10,000 events in a minute. Paying for an always-on server for that is waste. Functions bill per execution — idle costs nothing (Consumption plan).

When to use

Event-driven processing (resize image when uploaded), scheduled jobs (nightly cleanup), lightweight APIs, glue code connecting services.

Architect's lens

Plans matter for AZ-305: Consumption (true serverless, cold starts), Premium (pre-warmed, VNet access, no cold start), Dedicated (run in App Service Plan). Durable Functions extend it for long-running, stateful workflows (chained steps, fan-out/fan-in).

2.7 Azure Logic Apps ▾

🪔 Imagine this

A super-efficient office clerk following a flowchart: 'WHEN an invoice email arrives → save attachment to the shared drive → notify the finance Teams channel → add a row in the tracker.' No coding — just the flowchart.

What is it?

A low-code/no-code workflow automation service with 1,000+ ready-made connectors (Outlook, SAP, Salesforce, SQL, Teams...). You design workflows visually: trigger → actions → conditions.

Why it exists

Most business automation is integration plumbing — 'when X happens in system A, do Y in system B'. Writing custom code for every such flow is expensive. Logic Apps makes it drag-and-drop.

When to use

Business process automation, system integration (connect SaaS apps + on-prem systems), approval workflows, scheduled data syncs.

Architect's lens

Functions vs Logic Apps — the classic exam comparison: Functions = code-first, developer writes logic. Logic Apps = designer-first, visual workflow with connectors. They combine beautifully: Logic App orchestrates the flow, calls a Function for custom logic.

Section 3

Data Storage — Files, Shares & Disks

Not all data is equal. A product photo, a shared office drive, and a VM's hard disk are completely different animals — and Azure has a purpose-built home for each. The umbrella over most of them is the Storage Account.

3.1 Azure Storage Accounts ▾

🪔 Imagine this

A bank locker facility. One building (the account), but inside there are different locker types — document lockers (blobs), shared family lockers (files), small quick-access boxes (queues, tables).

What is it?

The top-level container for Azure's core storage services. One storage account can hold: Blobs (objects/files), Files (network shares), Queues (messages), and Tables (simple NoSQL).

Why it exists

It gives one place to manage settings that apply to all data inside: redundancy level, security, networking, and access keys.

When to use

You'll create one for almost every workload. Separate accounts when workloads need different redundancy, security, or billing separation.

Architect's lens

Key account-level decisions: performance tier (Standard HDD-backed vs Premium SSD-backed), redundancy (next card), access tier defaults, and network rules (public vs private endpoint).

3.2 Data Redundancy (LRS / ZRS / GRS / RA-GRS) ▾

🪔 Imagine this

Photocopies of an important certificate: 3 copies in the same cupboard (LRS), copies in 3 different rooms of the house (ZRS), copies also at your relative's house in another city (GRS), and the relative is allowed to show them to you anytime (RA-GRS).

What is it?

Azure always keeps multiple copies of your data. Redundancy options decide WHERE those copies live: LRS = 3 copies in one datacenter. ZRS = 3 copies across 3 zones in one region. GRS = LRS + 3 more copies in a paired region. RA-GRS = GRS + read access to the secondary copy.

Why it exists

Hardware fails, datacenters flood, regions go down. The question is: which disaster do you need to survive — a disk failure, a datacenter failure, or an entire regional outage?

When to use

LRS: dev/test, easily recreatable data. ZRS: production needing zone resilience. GRS/RA-GRS: business-critical data that must survive a regional disaster.

Architect's lens

Exam mindset: match redundancy to business impact + budget. GRS costs ~2x LRS. RA-GRS lets apps READ from the secondary region even while primary is healthy — useful for read-heavy global apps.

3.3 Azure Blob Storage ▾

🪔 Imagine this

An unlimited digital godown (warehouse). Throw in anything — photos, videos, backups, logs — pay only for the shelf space you use, and pay less for shelves you rarely visit.

What is it?

Object storage for unstructured data — any file type, any size, massive scale. Organized as Account → Containers → Blobs. Accessed over HTTPS via URLs, SDKs, or REST API.

Why it exists

Databases are terrible at storing big binary files (slow, expensive). Blob storage is built exactly for this — cheap, durable (11 nines), and infinitely scalable.

When to use

User uploads, images/videos, backups, log archives, static website hosting, the raw zone of a data lake.

Architect's lens

Access tiers are the key design lever: Hot (frequent access), Cool (30+ days, cheaper storage, costlier reads), Cold (90+ days), Archive (180+ days, cheapest, hours to retrieve). Lifecycle policies move data between tiers automatically — huge cost saver.

3.4 Azure Files ▾

🪔 Imagine this

The office shared drive (the famous 'Z: drive'). Everyone in the team maps it on their computer and sees the same folders — except now it lives in the cloud instead of a server under someone's desk.

What is it?

Fully managed file shares in the cloud using SMB/NFS protocols — the same protocols traditional file servers use. Mount it on Windows, Linux, or macOS like a normal network drive.

Why it exists

Thousands of legacy apps and teams depend on shared network drives. Azure Files lets you move them to the cloud WITHOUT changing how the apps work — same drive letter, same paths.

When to use

Lift-and-shift of apps using file shares, centralized config files, shared team storage, replacing aging on-prem file servers.

Architect's lens

Know Azure File Sync for the exam: keeps an on-prem Windows server as a fast local cache while the full data lives in Azure — branch offices get local speed with cloud capacity.

3.5 Azure Disk Storage ▾

🪔 Imagine this

The SSD/hard drive inside your rented computer. The flat (VM) needs storage inside it — you choose how fast and how big that drive should be.

What is it?

Managed block storage volumes attached to VMs — the OS disk and data disks. Azure handles the underlying storage infrastructure ('managed disks').

Why it exists

Every VM needs disks, and disk performance often decides application performance. A slow disk under a fast database = a slow database.

When to use

Automatically with every VM. The design decision is the TYPE: Standard HDD (dev/test), Standard SSD (light prod), Premium SSD (production workloads), Ultra Disk (extreme IOPS — heavy databases).

Architect's lens

Architect levers: disk tier (IOPS/throughput), size, caching settings (ReadOnly cache for data disks of databases), snapshots for backup, and disk encryption (platform-managed vs customer-managed keys).

3.6 Storage Security ▾

🪔 Imagine this

Locker keys with rules: a guest key that expires Sunday 6 PM and opens only locker #12 (SAS token), CCTV + entry register (logging), and a locker room with no street entrance at all (private endpoint).

What is it?

The layered controls protecting storage: encryption at rest (on by default), access keys vs Entra ID auth, SAS tokens (time-limited, scoped access links), network rules/firewall, and private endpoints.

Why it exists

Storage accounts hold the crown jewels — backups, customer files, data lakes. Misconfigured public storage is one of the most common cloud breaches worldwide.

When to use

Every storage design. The exam loves asking the MOST SECURE option for a scenario.

Architect's lens

Security ladder (least → most secure): Account keys (avoid — full access, no expiry) → SAS tokens (scoped + time-limited) → Entra ID + RBAC (identity-based, auditable) → plus Private Endpoints to remove public network exposure entirely. Prefer Managed Identity + RBAC for app access.

Section 4

Databases — SQL, Cosmos & Data Protection

Storage holds files; databases hold structured, queryable, transactional data — orders, customers, payments. Here the architecture questions become: how does it scale, how does it survive failures, and how is the data protected at every moment of its life?

4.1 Azure SQL Database ▾

🪔 Imagine this

A meticulous accountant with a strict ledger. Every entry follows rules, totals always balance, and a transaction either fully happens or doesn't happen at all — never half.

What is it?

A fully managed relational database (PaaS) based on SQL Server. Microsoft handles patching, backups, high availability. You get tables, relationships, T-SQL, and ACID transactions.

Why it exists

Business-critical data — orders, payments, inventory — needs guaranteed consistency and relationships between tables. Relational databases have been the answer for 40 years, and Azure SQL removes the server-management pain.

When to use

Transactional applications: e-commerce orders, banking records, ERP/CRM data — anywhere correctness matters more than raw flexibility.

Architect's lens

Deployment options to know: Single Database (one isolated DB), Elastic Pool (many DBs share resources — perfect for SaaS with many customer DBs), Managed Instance (near-100% SQL Server compatibility for migrations). Purchase models: DTU (simple bundle) vs vCore (granular, supports reserved pricing).

4.2 Database Scalability ▾

🪔 Imagine this

A restaurant getting crowded: you can buy a bigger kitchen (scale UP), open more branches (scale OUT), or set up self-service counters just for viewing the menu (read replicas).

What is it?

Strategies for handling growth: Vertical scaling (bigger tier — more CPU/RAM), Read scale-out (replicas serve read queries), Sharding (split data across multiple databases), Elastic Pools (shared capacity across many DBs), and Hyperscale (storage grows to 100+ TB with fast scaling).

Why it exists

Databases are usually the first bottleneck as apps grow. Choosing the wrong scaling strategy means either overspending or hitting a wall during peak business.

When to use

Vertical: quick fix, has a ceiling. Read replicas: read-heavy apps (reports, catalogs). Sharding: multi-tenant or massive datasets. Hyperscale: very large single databases with unpredictable growth.

Architect's lens

Exam pattern: 'reports are slowing down the production DB' → route reports to a read replica. 'SaaS with 500 customer databases with different peak times' → Elastic Pool. 'DB will grow beyond 4 TB' → Hyperscale.

4.3 Database Availability ▾

🪔 Imagine this

A hospital with a backup generator (zone redundancy) AND a sister hospital in another city ready to take patients if the whole city loses power (geo-replication + failover).

What is it?

Designs that keep the database alive through failures: built-in HA (every Azure SQL tier), Zone redundancy (replicas across availability zones), Active Geo-Replication (readable secondary in another region), and Failover Groups (group of DBs failing over together with one connection string).

Why it exists

Database downtime = business downtime. The design question: which failure must you survive (server? datacenter? region?) and how much downtime/data loss is acceptable (RTO/RPO)?

When to use

Zone redundancy: production within one region. Geo-replication/Failover groups: business-critical apps that must survive a full regional outage.

Architect's lens

Two numbers drive every availability design: RTO (how fast must we recover) and RPO (how much data can we lose). Failover Groups are the exam favorite — automatic failover, and the listener endpoint means applications don't change connection strings.

4.4 Data Security: At Rest, In Transit, In Use ▾

🪔 Imagine this

Protecting cash at all three moments: locked in the vault (at rest), in an armored van while moving (in transit), and counted behind a privacy screen so even staff nearby can't see (in use).

What is it?

Three states of data, three protections: At rest → Transparent Data Encryption (TDE) encrypts stored files automatically. In transit → TLS encrypts data moving over the network. In use → Always Encrypted / confidential computing keeps data encrypted even while being processed, so even DBAs can't read it.

Why it exists

Attackers target all three states — stolen disks, intercepted traffic, and insider threats. Compliance frameworks (banking, healthcare) explicitly require protection of all three.

When to use

TDE and TLS: always (mostly on by default). Always Encrypted: highly sensitive columns — card numbers, national IDs, medical records — where even administrators must not see plaintext.

Architect's lens

Additional layers to mention in designs: Dynamic Data Masking (hide data from non-privileged users in results), Row-Level Security (users see only their rows), Auditing + Microsoft Defender for SQL (threat detection).

4.5 Azure SQL Edge ▾

🪔 Imagine this

A pocket-sized accountant living inside the factory machine itself — recording everything locally even when the internet is down, syncing the books to head office when the line comes back.

What is it?

A small-footprint version of SQL Server engine (~500 MB) that runs in containers on IoT/edge devices — with built-in data streaming and time-series support.

Why it exists

Factories, ships, and remote sites can't depend on constant connectivity, and sending every sensor reading to the cloud is slow and costly. Processing data locally at the 'edge' solves both.

When to use

IoT scenarios needing local storage + processing on the device: manufacturing lines, connected vehicles, retail stores, offshore equipment.

Architect's lens

Architecture pattern: SQL Edge processes/filters data on-device → only meaningful aggregates sync to Azure (IoT Hub → cloud database). Same T-SQL skills work at the edge and in the cloud.

4.6 Azure Cosmos DB (+ Table API) ▾

🪔 Imagine this

A chain of notebooks kept in every city of the world, all magically synced. A customer in Tokyo and one in Berlin each write to their LOCAL notebook — and reads are lightning fast everywhere.

What is it?

A globally distributed NoSQL database with guaranteed single-digit-millisecond reads/writes. Multi-model: document (JSON), key-value (Table API), graph (Gremlin), MongoDB/Cassandra compatible APIs.

Why it exists

Global apps can't serve the world from one region — physics (latency) forbids it. Cosmos replicates data to any regions you pick, supports multi-region writes, and scales practically without limit.

When to use

Global low-latency apps, flexible/evolving schemas, massive scale (IoT, gaming, retail catalogs). Table API: a premium upgrade path for Azure Table Storage apps.

Architect's lens

The two design decisions that make or break Cosmos: (1) Partition key — choose a property with high cardinality and even access spread, or you'll create hot partitions. (2) Consistency level — 5 options from Strong (always latest, slower) to Eventual (fastest, may briefly read stale). Session is the practical default. Cost = Request Units (RU/s).

Section 5

Data Integration & Analytics — From Raw Data to Decisions

Companies have data scattered across 20 systems. The analytics story is a journey: COLLECT it (Data Factory), STORE it raw (Data Lake), PROCESS it (Databricks), ANALYZE it at scale (Synapse), and watch it LIVE (Stream Analytics). Think of it as a city water system: pipelines, reservoir, treatment plant, distribution HQ, and live quality sensors.

5.1 Azure Data Factory (ADF) ▾

🪔 Imagine this

A courier and sorting company for data. Every night at 2 AM it picks up parcels (data) from 15 different offices (systems), sorts and re-labels them (transforms), and delivers them to the warehouse — fully automated, on schedule.

What is it?

A cloud ETL/ELT orchestration service. You visually build pipelines that copy data from 100+ sources (SQL, SAP, Oracle, files, APIs...), transform it (mapping data flows), and load it into destinations — on schedule or on trigger.

Why it exists

Analytics needs data from everywhere in one place. Hand-writing and maintaining 50 copy scripts is fragile. ADF makes the movement visual, monitored, retryable, and scheduled.

When to use

Building data pipelines, nightly data warehouse loads, migrating on-prem data to the cloud, orchestrating multi-step data workflows.

Architect's lens

Key concepts: Pipeline (the workflow), Activities (steps), Linked Services (connections), Integration Runtime (the compute that moves data — including Self-Hosted IR to reach on-premises systems behind firewalls — a classic exam point).

5.2 Azure Data Lake Storage (ADLS Gen2) ▾

🪔 Imagine this

A massive reservoir that stores raw water from every source — rivers, rain, borewells — before any treatment. Store everything first; decide how to use it later.

What is it?

Blob Storage + hierarchical namespace (real folders/directories) + big-data optimizations. The standard place to store enormous amounts of raw and processed data cheaply.

Why it exists

Traditional warehouses force you to structure data BEFORE storing (expensive, slow, and you discard things you might need later). A lake stores everything raw and cheap — structure it when you actually need it.

When to use

The central storage layer of any analytics platform; the landing zone for ADF pipelines; the data source for Databricks and Synapse.

Architect's lens

Design the lake in zones — Raw (as-received) → Curated/Enriched (cleaned) → Consumption (ready for reports). Hierarchical namespace enables folder-level ACLs and fast renames that analytics engines depend on.

5.3 Azure Databricks ▾

🪔 Imagine this

A high-end research laboratory next to the reservoir. Scientists (data engineers/data scientists) take raw water and run serious experiments — purification, analysis, prediction — with industrial-grade equipment.

What is it?

A managed Apache Spark platform for heavy data engineering, advanced analytics, and machine learning. Collaborative notebooks (Python/SQL/Scala/R) running on auto-scaling Spark clusters.

Why it exists

Transforming terabytes and training ML models needs distributed computing power that a single machine (or plain SQL) can't deliver. Spark distributes the work across a cluster; Databricks removes the cluster-management pain.

When to use

Large-scale data transformation, data science and ML workloads, streaming + batch processing on the same platform — when data engineers/scientists need code-first power.

Architect's lens

Pattern to remember: ADF orchestrates → Databricks transforms (reads/writes the Data Lake) → results land in Synapse/SQL for reporting. Clusters auto-terminate when idle to control cost.

5.4 Azure Synapse Analytics ▾

🪔 Imagine this

The corporate analytics headquarters — reservoir access, treatment units, and the boardroom dashboard all in ONE building, so teams stop running between offices.

What is it?

A unified analytics platform combining: data warehousing (dedicated SQL pools), on-demand querying of lake files (serverless SQL), Spark processing, and built-in pipelines (ADF engine) — one workspace, one studio.

Why it exists

Enterprises traditionally stitched together a warehouse + lake + ETL + Spark from separate tools. Synapse unifies them — query the lake and the warehouse together in the same place.

When to use

Enterprise data warehousing, BI/reporting at scale (Power BI integration), and when one team needs SQL + Spark + pipelines without managing separate services.

Architect's lens

Exam lever — choose the right pool: Dedicated SQL pool (reserved, predictable heavy warehouse workloads), Serverless SQL pool (pay-per-query exploration of lake files — no infrastructure at all), Spark pool (big data processing).

5.5 Azure Stream Analytics ▾

🪔 Imagine this

A security guard watching the CCTV feed LIVE and raising the alarm the moment something looks wrong — instead of reviewing yesterday's recording tomorrow morning.

What is it?

A real-time analytics engine that runs continuous SQL-like queries on streaming data (from Event Hubs / IoT Hub) — computing aggregates, detecting patterns, and pushing results out within seconds.

Why it exists

Some insights expire in seconds: a machine overheating, a fraud pattern, a traffic spike. Batch analytics that runs tonight is too late — you need answers while the data is still flowing.

When to use

Real-time dashboards, IoT alerting (temperature crosses threshold → alert), live fraud/anomaly detection, clickstream analysis.

Architect's lens

The pipeline shape: Input (Event Hubs/IoT Hub/Blob) → Query (SQL with time windows — Tumbling, Hopping, Sliding) → Output (Power BI live dashboard, Functions, SQL, Cosmos). Windowing functions are the exam favorite — 'average temperature every 5 minutes' = Tumbling window.

Section 6 · TODAY

Application Architecture — Messaging, Events & App Plumbing

Modern applications are not one big block — they're many small parts that must talk to each other WITHOUT depending on each other. Today's first theme: how parts of a system communicate (messages vs events), and the supporting plumbing — APIs, caching, configuration, and automated deployment.

6.1 Messages vs Events (The Core Idea) ▾

🪔 Imagine this

A message is a registered parcel — it contains the actual goods and the sender expects it to be processed. An event is the doorbell — a lightweight 'something happened!' notification; whoever cares can react.

What is it?

Two communication styles: A MESSAGE carries the data itself (an order, a job) — the producer expects someone to process it. An EVENT announces that something happened (a file was uploaded) — the producer doesn't know or care who reacts.

Why it exists

Choosing the wrong one is a classic architecture mistake. Order processing through an 'event' notification system can lose business data; sending heavy payloads as messages where a light ping suffices wastes resources.

When to use

Message → commands and valuable data that MUST be processed (orders, payments). Event → notifications and reactions (file uploaded → trigger thumbnail generator).

Architect's lens

This single distinction decides the service: Messages → Service Bus / Storage Queues. Discrete events → Event Grid. Massive event streams → Event Hubs. Every AZ-305 messaging question starts here.

6.2 Azure Service Bus ▾

🪔 Imagine this

Registered post with delivery guarantee: parcels wait safely in order, delivery is confirmed, failed deliveries go to a special shelf for investigation, and one parcel can be photocopied to multiple subscribed recipients.

What is it?

Enterprise-grade message broker. Queues (one sender → one receiver processes each message) and Topics with Subscriptions (one message → many subscribers, each with filters). Supports ordering (FIFO via sessions), transactions, duplicate detection, and dead-letter queues.

Why it exists

When messages represent money or commitments (orders, bookings, payments), 'mostly delivered' isn't acceptable. Service Bus guarantees delivery semantics that business-critical systems require.

When to use

Order processing, financial transactions, decoupling microservices where every message matters, publish-subscribe within enterprise systems.

Architect's lens

Features that answer exam scenarios: Dead-letter queue (poison messages parked for inspection), Sessions (strict FIFO ordering), Duplicate detection, Topics+filters (each subscriber gets only relevant messages). 'Guaranteed, ordered, transactional' → Service Bus.

6.3 Azure Storage Queues ▾

🪔 Imagine this

A simple token system at a government counter — take a token, wait your turn. No frills, costs almost nothing, handles a huge crowd.

What is it?

Basic, very cheap message queuing built into Storage Accounts. Simple semantics: put message, get message, delete message. Scales to millions of messages.

Why it exists

Not every queue needs enterprise features. For simple background-job handoffs, Service Bus is overkill — Storage Queues do the job at a fraction of the cost.

When to use

Simple work distribution (web app drops jobs, workers pick them up), buffering bursts, queues over 80 GB (Service Bus max), cost-sensitive designs.

Architect's lens

The comparison the exam loves: need ordering/transactions/topics/dead-lettering → Service Bus. Need simple + cheap + giant volume → Storage Queue. Both pair perfectly with Azure Functions queue triggers.

6.4 Azure Event Grid ▾

🪔 Imagine this

The society's notification system: when the water tanker arrives, the system instantly pings exactly the flats that subscribed to 'tanker updates' — not everyone, and the tanker driver doesn't maintain anyone's phone numbers.

What is it?

A fully managed event ROUTING service. Sources (Storage, Resource Groups, custom apps) publish events; Event Grid pushes them instantly to subscribed handlers (Functions, Logic Apps, webhooks) with filtering.

Why it exists

Without it, services must constantly poll 'anything new? anything new?' — wasteful and slow. Event Grid inverts it: react instantly, pay per event, near-real-time push.

When to use

Reactive automation: blob uploaded → process it; VM created → tag it; custom app event → notify systems. The glue of serverless architectures.

Architect's lens

Remember the trio: Event Grid = lightweight ROUTER of discrete events (reactions). Event Hubs = heavy PIPELINE for streaming millions of telemetry events (analytics). Service Bus = TRUCK for valuable message data (commands).

6.5 Azure Event Hubs ▾

🪔 Imagine this

The baggage conveyor system of a giant airport — millions of bags per hour flow through continuously; downstream teams (analytics, security scan) each read the stream at their own pace.

What is it?

A big-data event STREAMING platform ingesting millions of events per second — telemetry, logs, clickstreams. Consumers read the stream independently; events are retained for a period (rewindable).

Why it exists

IoT fleets and high-traffic apps generate firehoses of data that normal queues can't ingest. Event Hubs is purpose-built to swallow that firehose and feed it to analytics.

When to use

IoT telemetry ingestion, application log/clickstream pipelines, feeding Stream Analytics or Databricks with live data.

Architect's lens

Key design knobs: Partitions (parallelism — set carefully, hard to change later), Consumer Groups (independent readers), Capture (auto-archive the stream to Data Lake — zero code), retention period. Pairs with Stream Analytics for the classic real-time pipeline.

6.6 API Integration — Azure API Management (APIM) ▾

🪔 Imagine this

A 5-star hotel receptionist: every visitor goes through the front desk — identity checked, visit logged, VIPs prioritized, troublemakers limited — and guests never wander the corridors knocking on random doors.

What is it?

A gateway that sits in front of ALL your APIs — one front door. It handles security (keys, JWT, OAuth), rate limiting/throttling, request/response transformation, caching, analytics, and a developer portal with documentation.

Why it exists

Exposing 20 backend APIs directly means implementing security, throttling, and versioning 20 times — inconsistently. APIM centralizes it once, and backends can change without breaking consumers.

When to use

Publishing APIs to partners/public, microservices needing one secured entry point, monetizing APIs, managing API versions and revisions.

Architect's lens

Power lives in Policies — XML rules applied to requests/responses: validate-jwt (auth), rate-limit (throttle per subscriber), set-header / rewrite, response caching. Tiers range from Consumption (serverless, pay-per-call) to Premium (VNet, multi-region).

6.7 Caching for Applications ▾

🪔 Imagine this

Keeping a water bottle on your desk instead of walking to the cooler for every sip. The cooler (database) still exists — you just stop visiting it 50 times an hour.

What is it?

Storing frequently read data in fast in-memory storage close to the app. Azure Cache for Redis: managed in-memory store with sub-millisecond reads. Azure CDN / Front Door caching: static content cached at edge locations near users.

Why it exists

The fastest database query is the one you never run. Caching slashes latency, cuts database load (and cost), and absorbs traffic spikes that would otherwise crush the database.

When to use

Redis: session state, hot product/catalog data, API response caching, leaderboards/counters. CDN: images, scripts, videos — anything static served globally.

Architect's lens

Patterns to name in designs: Cache-aside (app checks cache → on miss, reads DB and fills cache) with TTL expiry. Redis tiers: Basic (dev), Standard (replicated), Premium (persistence, clustering, VNet). Decide upfront what staleness is acceptable.

6.8 App Configuration Management ▾

🪔 Imagine this

One master control room for a chain of 50 stores. Change the discount banner once at HQ — every store updates instantly. No staff member edits posters by hand at each branch.

What is it?

Azure App Configuration: a central store for application settings and feature flags, separate from code. Pairs with Key Vault, which holds the SECRETS (passwords, keys) while App Config holds regular settings.

Why it exists

Settings scattered across config files in every deployment = inconsistency, redeployments for tiny changes, and secrets accidentally committed to Git (a top real-world breach cause).

When to use

Apps deployed to multiple environments/instances, teams wanting feature flags (turn features on/off live, gradual rollouts), centralized settings governance.

Architect's lens

Design rule: settings → App Configuration; secrets → Key Vault; app authenticates to both with Managed Identity (zero credentials in code). Feature flags enable testing in production safely — release the code dark, switch on for 5% of users first.

6.9 Automated Deployment for Applications ▾

🪔 Imagine this

An assembly-line robot that builds the car identically every single time — vs a craftsman who's brilliant but occasionally forgets a bolt on Fridays.

What is it?

Infrastructure as Code (IaC) + CI/CD pipelines. IaC (ARM templates / Bicep / Terraform) defines infrastructure in files. CI/CD (GitHub Actions / Azure DevOps Pipelines) automatically builds, tests, and deploys code on every change.

Why it exists

Manual deployments are slow, unrepeatable, and error-prone ('it worked in dev!'). Automation makes every environment identical, every release auditable, and rollbacks instant.

When to use

Every serious project. The exam expects IaC + pipelines as the default answer for 'consistent, repeatable deployments'.

Architect's lens

Names to map: Bicep (Azure-native IaC, cleaner than ARM JSON), Terraform (multi-cloud IaC), GitHub Actions / Azure Pipelines (CI/CD), Deployment Slots on App Service (deploy to staging → warm up → swap to production with zero downtime, instant swap-back rollback).

Section 7 · TODAY

Identity & Access — Who Are You, and What Can You Do?

In the cloud there is no office gate or guard — IDENTITY is the new security perimeter. Every design starts with: who is this (authentication), what may they do (authorization), and how do we keep checking (zero trust)? One security office runs it all: Microsoft Entra ID.

7.1 Microsoft Entra ID (formerly Azure AD) ▾

🪔 Imagine this

The company's security office: it issues every ID card, keeps the employee register, checks cards at every door, and instantly invalidates a card when someone leaves.

What is it?

Microsoft's cloud identity and access management service. It stores identities (users, groups, applications, devices), authenticates sign-ins, issues tokens, and enables Single Sign-On across thousands of apps.

Why it exists

Without central identity, every app maintains its own usernames/passwords — users juggle 20 passwords, IT can't disable a leaver everywhere, and attackers feast on the weakest app.

When to use

Always — it's the identity foundation of Azure, Microsoft 365, and any modern app you build.

Architect's lens

Core objects to know: Users, Groups (assign access to groups, never individuals), App Registrations (your apps), Service Principals & Managed Identities (non-human identities). Design mantra: authenticate with Entra ID, authorize with RBAC, eliminate stored passwords with Managed Identity.

7.2 Microsoft Entra B2B (Business-to-Business) ▾

🪔 Imagine this

A visitor pass for a partner company's employee — they enter your office using their OWN company ID card. You decide which rooms the pass opens; their company still owns the card.

What is it?

Guest collaboration: invite external users (partners, vendors, consultants) into your tenant. They sign in with THEIR OWN organization's credentials — you never create or manage a password for them.

Why it exists

Creating internal accounts for externals is a security nightmare: orphaned accounts linger after projects end, and you carry their password risk. B2B keeps their identity at home while you control access.

When to use

Partner access to Teams/SharePoint/apps, vendor portals, consultants working in your Azure subscription.

Architect's lens

Guests appear as 'Guest' user type — apply Conditional Access and Access Reviews to them (review quarterly: does this partner still need access?). Cross-tenant access settings control which external orgs you trust.

7.3 Azure AD B2C (Business-to-Consumer) ▾

🪔 Imagine this

The membership system of a shopping mall app — customers sign up themselves with Google or Facebook or email. Lakhs of customers, but they're members, NOT employees with office ID cards.

What is it?

A SEPARATE identity service for your customer-facing apps. Customers self-register and sign in with social accounts (Google, Facebook) or email — with fully customizable, branded sign-in pages. Scales to millions of users.

Why it exists

Customer identities must never mix with your corporate directory (different scale, different risk, different experience). And building your own secure login system from scratch is how breaches are born.

When to use

Any consumer-facing application — retail apps, citizen portals, customer self-service — needing sign-up/sign-in at scale.

Architect's lens

Exam keyword mapping: 'partners/vendors collaborating' → B2B (guests in YOUR tenant). 'customers/consumers signing up with social accounts' → B2C (separate tenant, branded journeys via user flows/custom policies).

7.4 Conditional Access ▾

🪔 Imagine this

An intelligent bouncer: 'Known face, office laptop, usual city — go in. Same person from an unknown café Wi-Fi at 3 AM — show second ID (MFA). Sign-in attempt from an impossible location — blocked.'

What is it?

Entra ID's policy engine: IF (user/group + app + location + device state + risk level) THEN (require MFA / require compliant device / block / allow). Evaluated on every sign-in.

Why it exists

A correct password is no longer proof of identity — passwords get phished daily. Context (where, what device, how risky) must shape the decision. This is the heart of Zero Trust.

When to use

Every organization. Baseline policies: require MFA for admins, block legacy authentication, require compliant devices for sensitive apps.

Architect's lens

Design tips that score: use Report-only mode to test policies before enforcing; never lock yourself out (exclude a break-glass account); combine with Identity Protection risk signals ('high sign-in risk → block'). Requires Entra ID P1 (P2 for risk-based).

7.5 Identity Protection ▾

🪔 Imagine this

The bank's fraud department: it doesn't check your signature — it notices your card was suddenly swiped in two countries within an hour, or your card number appeared on a leaked list, and acts automatically.

What is it?

An Entra ID P2 service using Microsoft's threat intelligence + ML to detect identity risks: leaked credentials found on the dark web, impossible travel, sign-ins from anonymized IPs/malware-linked sources. Classifies User risk and Sign-in risk (low/medium/high).

Why it exists

Humans can't watch millions of sign-ins for subtle attack patterns. Automated risk detection catches compromised accounts BEFORE damage spreads.

When to use

Organizations with Entra ID P2 wanting automated identity threat response — typically combined with Conditional Access.

Architect's lens

The power move: risk-based Conditional Access — 'sign-in risk medium → force MFA; user risk high → force secure password change.' The system self-heals: a phished account gets challenged and remediated automatically.

7.6 Access Reviews ▾

🪔 Imagine this

The yearly key audit: 'Here's the list of everyone holding a server-room key. Does each one STILL need it?' — because people change teams, projects end, but keys are rarely returned.

What is it?

Scheduled, automated recertification campaigns: periodically ask managers/owners (or users themselves) to confirm whether each person still needs a group membership, app access, or privileged role. Non-confirmed access is removed automatically.

Why it exists

Access only ever accumulates — 'permission creep'. After 3 years, employees hold access from every old project. Each unused permission is attack surface. Auditors demand proof of periodic review.

When to use

Privileged roles (review monthly/quarterly), guest B2B users (do partners still need access?), sensitive groups and apps. Compliance-driven environments especially.

Architect's lens

Part of Entra ID Governance (P2). Configure: scope (which group/role), reviewers (managers/self/owners), frequency, and auto-action (remove access if denied or not answered). Pairs with PIM — Privileged Identity Management — where admin roles are activated just-in-time instead of held permanently.

7.7 Service Principals for Applications ▾

🪔 Imagine this

An ID card issued to a ROBOT, not a person. The robot does its nightly job with its own card and its own limited permissions — it never borrows an employee's card.

What is it?

An identity for an application or automation (not a human). Created via App Registration in Entra ID; the app authenticates with a client secret or certificate, and you grant it RBAC roles like any user.

Why it exists

Pipelines, scripts, and integrations need to access Azure. Using a human's account breaks when they leave and violates least-privilege. Apps deserve their own identity with exactly the permissions they need.

When to use

CI/CD pipelines deploying to Azure, third-party tools accessing your resources, multi-tenant apps, any non-human automation.

Architect's lens

The exam hierarchy of preference: Managed Identity (best — Azure-managed, NO secret to store or rotate, but only for resources running IN Azure) → Service Principal with certificate → Service Principal with client secret (must rotate; store it in Key Vault, never in code).

7.8 Azure Key Vault ▾

🪔 Imagine this

The office safe with a strict guard: passwords, master keys, and stamped certificates live inside. Every opening is logged — who, when, which item — and even most managers can't peek inside.

What is it?

A managed, HSM-backed service for storing three things: Secrets (connection strings, API keys, passwords), Keys (encryption keys), and Certificates (TLS certs with auto-renewal).

Why it exists

Hardcoded credentials in code/config are a leading cause of breaches — one leaked repo exposes everything. Key Vault centralizes secrets with access control, audit logs, and rotation.

When to use

Every application that touches a credential, key, or certificate. There is no scenario where hardcoding wins.

Architect's lens

The golden pattern (memorize it): App uses Managed Identity → authenticates to Key Vault → retrieves secret at runtime → zero credentials anywhere in code or config. Design extras: RBAC permission model (preferred over access policies), soft-delete + purge protection (compliance), private endpoint for network isolation, customer-managed keys (CMK) when regulators require you to own encryption keys.

Section 8 · TODAY

Monitoring & Logging — The Eyes and Ears of Your Architecture

A system you can't observe is a system you can't run. Think of a hospital ICU: sensors on the patient (data sources), a central lab analyzing reports (Log Analytics), the doctor's visual charts (Workbooks & Insights), and a research facility crunching years of records in seconds (Data Explorer).

8.1 Azure Monitor & Its Data Sources ▾

🪔 Imagine this

The ICU sensor network: heart-rate monitor, BP cuff, oxygen sensor — each device watches one thing, all readings flow to one central station where nurses see the full patient picture.

What is it?

Azure Monitor is the umbrella observability platform collecting two data types — Metrics (numeric time-series: CPU %, request count) and Logs (detailed records: errors, sign-ins, traces) — from every layer: applications, Azure resources, OS (via agents), subscription activity, and tenant (Entra ID) logs.

Why it exists

Failures rarely announce themselves politely. Without telemetry you discover outages from angry customer calls. With it, you detect, diagnose, and often auto-respond before users notice.

When to use

Every production workload from day one. The design question is never IF you monitor — it's which sources, where data goes, and how long you keep it.

Architect's lens

Architect's checklist: enable Diagnostic Settings on every resource (routes platform logs to a destination), install Azure Monitor Agent on VMs (guest OS telemetry), instrument apps with Application Insights, capture the Activity Log (who did what at subscription level). Routing options: Log Analytics (analyze), Storage (cheap archive), Event Hubs (stream to external SIEM like Splunk).

8.2 Log Analytics ▾

🪔 Imagine this

The hospital's central laboratory: every sample from every ward arrives here, and specialists run precise tests — 'show all patients whose fever spiked twice within six hours' — across the whole hospital's records at once.

What is it?

The central workspace where logs from all sources land, and the query engine on top — KQL (Kusto Query Language) — to search, correlate, and analyze them. The foundation for log-based alerts.

Why it exists

Logs scattered across 50 resources answer nothing. Centralized + queryable means you can correlate ('show app errors AND the VM's CPU at that exact minute') — that correlation is how root causes are found.

When to use

The default destination for almost all diagnostic logs. Most organizations use ONE central workspace (or few, when regions/compliance demand separation).

Architect's lens

Design levers the exam tests: workspace strategy (centralized vs per-region), retention (31 days default, configurable to 730 — archive longer to Storage for compliance), table-level retention to control cost, RBAC on workspace data. KQL basics worth showing students: Table | where | summarize count() by bin(TimeGenerated, 1h).

8.3 Azure Workbooks & Azure Insights ▾

🪔 Imagine this

Workbooks = the doctor's chart combining temperature graph + BP trend + medication notes in one visual page. Insights = ready-made specialist dashboards — the cardiologist's standard heart-monitoring panel, pre-designed by experts.

What is it?

Workbooks: interactive, customizable report canvases mixing KQL queries, metrics, text, and parameters into shareable dashboards. Insights: PRE-BUILT monitoring experiences for specific services — Application Insights (apps: requests, failures, dependency map, user analytics), VM Insights (performance + process map), Container Insights (AKS health), Network Insights.

Why it exists

Raw query results don't communicate. Visual, curated views turn telemetry into understanding — and pre-built Insights mean you don't design dashboards from scratch for common scenarios.

When to use

Workbooks: custom operational reports, incident postmortems, management-facing views. Insights: switch on for every app (App Insights), VM fleet, and AKS cluster you run.

Architect's lens

Application Insights is the star — know its features: live metrics, distributed tracing across microservices, Application Map (visual dependency graph showing which component is failing), availability tests (ping your app from worldwide locations), smart detection of anomalies.

8.4 Azure Data Explorer (ADX) ▾

🪔 Imagine this

A super-librarian who has indexed every book in a national library — ask 'find every mention of this phrase across 10 billion pages' and get the answer in two seconds, not two weeks.

What is it?

A standalone, massively scalable analytics engine optimized for huge volumes of telemetry, logs, and time-series data — interactive queries over billions of rows in seconds, using the same KQL. (Log Analytics actually runs on ADX technology underneath.)

Why it exists

Azure Monitor/Log Analytics is curated for Azure resource monitoring. But when YOUR PRODUCT generates terabytes of custom telemetry daily (IoT fleet, game events, app analytics), you need the raw engine — with full control over ingestion, retention, and cost.

When to use

Custom telemetry platforms at massive scale: IoT sensor analytics, clickstream analysis, security log exploration, any 'billions of rows, interactive speed' requirement.

Architect's lens

The decision line for the exam: monitoring AZURE resources and apps → Azure Monitor + Log Analytics (managed experience). Building YOUR OWN large-scale telemetry/time-series analytics → Azure Data Explorer (you control clusters, databases, ingestion from Event Hubs/IoT Hub). Same KQL skills work in both — learn once, use everywhere.

Section 9

Networking — Connecting, Delivering & Protecting

Think of your Azure estate as a private township: internal roads (VNets), highways to your old office campus (VPN/ExpressRoute), smart toll plazas routing visitors to the right gate (delivery services), and security walls + guards at every entrance (protection services).

9.1 Network Architecture Based on Workload Requirements ▾

🪔 Imagine this

A town planner doesn't start by drawing roads — they ask: how many people, what traffic, which areas are restricted? The network design FOLLOWS the workload's needs, never the other way round.

What is it?

The discipline of deriving network design from workload requirements: who connects (internet users? employees? partners?), what traffic patterns (north-south vs east-west), what isolation/compliance is needed, and what performance/latency targets exist.

Why it exists

Networks designed without requirements become either over-engineered (cost) or under-secured (breach). AZ-305 scenarios always hide the answer inside the requirements.

When to use

First step of EVERY infrastructure design — before choosing any networking service.

Architect's lens

The standard enterprise answer is Hub-Spoke topology: a hub VNet holds shared services (firewall, gateways, DNS), spoke VNets hold workloads, connected via peering. Read requirements for keywords: 'private only' → Private Endpoints, 'deterministic latency to on-prem' → ExpressRoute, 'global users' → Front Door.

9.2 Azure Network Connectivity Services ▾

🪔 Imagine this

The township's road system: internal colony roads (VNet), connecting bridges between colonies (peering), a managed highway network when colonies multiply (Virtual WAN), and the address directory telling everyone where each house is (DNS).

What is it?

The building blocks connecting things INSIDE Azure: Virtual Network (your private IP space with subnets), VNet Peering (private connection between VNets — even cross-region), Virtual WAN (Microsoft-managed global hub network for large estates), Azure DNS + Private DNS Zones (name resolution, including for private endpoints).

Why it exists

Resources must communicate privately and predictably. Peering keeps traffic on Microsoft's backbone (never the public internet); Private DNS makes private endpoints resolvable by name.

When to use

VNet: always. Peering: connecting 2–20 VNets. Virtual WAN: dozens of VNets + many branch offices globally. Private DNS zones: mandatory whenever you use Private Endpoints.

Architect's lens

Design facts that score: peering is non-transitive (A↔B and B↔C does NOT give A↔C — the hub-spoke pattern with a firewall/NVA in the hub solves this), peered traffic stays on the Microsoft backbone, and each private endpoint type needs its matching privatelink DNS zone.

9.3 On-Premises Connectivity to Azure VNets ▾

🪔 Imagine this

Three ways to reach your old office campus: a secure tunnel through public roads (VPN), your own private dedicated highway (ExpressRoute), or a personal secure tunnel for one traveling employee's laptop (Point-to-Site).

What is it?

Options to connect your datacenter/offices to Azure: Site-to-Site VPN (IPsec tunnel over the internet via VPN Gateway — up to ~10 Gbps aggregate), Point-to-Site VPN (individual devices connect in), and ExpressRoute (private dedicated circuit through a connectivity provider — 50 Mbps to 100 Gbps, traffic never touches the internet).

Why it exists

Hybrid is reality: apps in Azure need to reach databases, AD, and systems still on-premises — securely and reliably.

When to use

S2S VPN: quick start, dev/test, smaller orgs, backup path. ExpressRoute: production hybrid, predictable latency, compliance ('traffic must not traverse public internet'), high bandwidth. P2S: remote developers/admins.

Architect's lens

The exam pattern: 'mission-critical + deterministic latency / private connection' → ExpressRoute; add a Site-to-Site VPN as FAILOVER for the ExpressRoute circuit (the classic resilient hybrid design). Gateway SKU choices control throughput; ExpressRoute needs a Gateway too.

9.4 Application Delivery Services ▾

🪔 Imagine this

Traffic management for the township: a simple traffic constable distributing cars between parallel lanes (Load Balancer), a smart toll plaza reading destination boards and routing by address (Application Gateway), and a national highway authority directing travelers to the nearest city (Front Door / Traffic Manager).

What is it?

Services that distribute user traffic: Azure Load Balancer (Layer 4, TCP/UDP, regional), Application Gateway (Layer 7, HTTP-aware: URL-path routing, SSL termination, cookie affinity, regional — with optional WAF), Azure Front Door (Layer 7, GLOBAL: edge routing, CDN caching, WAF), Traffic Manager (DNS-based global routing — no traffic flows through it).

Why it exists

One server can't serve everyone, and global users can't all be served well from one region. Delivery services spread load, route intelligently, and survive backend failures.

When to use

Memorize the 2×2: Regional + non-HTTP → Load Balancer. Regional + HTTP → Application Gateway. Global + HTTP → Front Door. Global + DNS-level/any protocol → Traffic Manager.

Architect's lens

Combinations are the real-world answer: Front Door (global entry, caching, WAF) → Application Gateway (regional L7, fine routing) → backend pools. 'URL path /images goes to server pool B' → App Gateway path-based routing. 'Route users to nearest region + failover' → Front Door (HTTP) or Traffic Manager (non-HTTP).

9.5 Application Protection Services ▾

🪔 Imagine this

Layered security of a bank branch: boundary walls (DDoS Protection), the main gate guard checking every visitor (Firewall), frisking at the hall entrance for known attack patterns (WAF), room-level door locks (NSGs), no public entrance at all for the vault (Private Endpoints), and a secure manager's corridor for staff (Bastion).

What is it?

The defense-in-depth toolkit: DDoS Network Protection (absorbs volumetric attacks), Azure Firewall (managed stateful firewall, FQDN/network rules, threat intelligence — sits in the hub), WAF (on App Gateway/Front Door — blocks SQL injection, XSS, OWASP Top 10), NSGs (subnet/NIC-level allow-deny rules), Private Endpoints (remove public exposure of PaaS), Azure Bastion (browser-based RDP/SSH to VMs without public IPs).

Why it exists

Internet-facing applications are attacked constantly and automatically. One layer always eventually fails — defense in depth means the next layer holds.

When to use

Public web app → WAF always. Hub-spoke enterprise → Azure Firewall in hub inspecting east-west + egress. Every VM admin access → Bastion (never public RDP). Every production PaaS → Private Endpoint.

Architect's lens

Know the layering order for design answers: DDoS at the edge → Front Door/App GW WAF for HTTP attacks → Azure Firewall for network/egress control → NSG for micro-segmentation → Private Endpoints to shrink the attack surface → Bastion for admin access. NSG vs Firewall: NSG = simple L3/L4 rules per subnet; Firewall = centralized, stateful, FQDN-aware, logs everything.

Section 10

Backup & Recovery — Designing for the Bad Day

Two different bad days, two different answers: 'we deleted/corrupted data' → BACKUP (go back in time), and 'our whole region/datacenter is down' → SITE RECOVERY (run from somewhere else). Every design starts with two numbers: RPO — how much data can we afford to lose, and RTO — how fast must we be back.

10.1 Backup & Recovery Design (RTO / RPO First) ▾

🪔 Imagine this

Insurance planning: how much loss can you absorb (RPO) and how quickly must life return to normal (RTO)? A street vendor and a hospital answer these very differently — so their 'insurance' costs differ too.

What is it?

The framework for protection design: RPO (Recovery Point Objective — maximum acceptable data loss, determined by backup/replication frequency) and RTO (Recovery Time Objective — maximum acceptable downtime, determined by your restore/failover method).

Why it exists

Backup design without RPO/RTO is guesswork. These two numbers — set by the BUSINESS, not IT — decide every technical choice and its cost.

When to use

The first question in every BCDR design. AZ-305 scenarios state them explicitly ('RPO of 15 minutes, RTO of 1 hour') — they are the answer key.

Architect's lens

Map mentally: daily backup → RPO up to 24h. Continuous replication (ASR) → RPO of minutes/seconds. Restore from backup → RTO of hours. Failover to warm standby → RTO of minutes. Tight RPO/RTO = replication + standby infrastructure = more cost. Loose RPO/RTO = scheduled backups = cheap.

10.2 Azure Backup ▾

🪔 Imagine this

A disciplined night watchman who photographs every room at a fixed time, stores photos in a fireproof archive in another building, keeps them as long as the policy says, and can recreate any room exactly as it was on any date.

What is it?

Azure's centralized managed backup service. Backs up Azure VMs, SQL/SAP in VMs, Azure Files, Blobs, Disks, and even on-premises machines (via MARS agent / Azure Backup Server). Data goes to a Recovery Services Vault (or Backup Vault for newer workloads) with policies controlling schedule + retention.

Why it exists

Ransomware, accidental deletion, corruption — backup is the last line of defense. A managed service means no backup servers, no tape rotation, no scripts to babysit.

When to use

Every production workload, full stop. The design questions are scope, frequency, retention, and vault protection.

Architect's lens

Design levers for the exam: vault redundancy (LRS/ZRS/GRS — GRS + cross-region restore for regional disaster), soft delete (deleted backups retained 14+ days), immutability + Multi-User Authorization (ransomware can't delete your backups), and policy tiers (instant restore snapshots for fast recovery + vault tier for retention).

10.3 Azure Blob Backup & Recovery ▾

🪔 Imagine this

A diary written in pen with carbon paper: every change creates a copy of the previous page (versioning), torn pages sit in a recoverable dustbin for some days (soft delete), and you can reconstruct the whole diary exactly as it looked on any past date (point-in-time restore).

What is it?

Blob protection has two modes: Operational backup — versioning + soft delete + change feed + point-in-time restore, all IN the same storage account (no copy elsewhere). Vaulted backup — data actually copied into a Backup Vault (survives even the storage account's deletion).

Why it exists

Blobs hold user uploads and data lakes — overwritten or deleted blobs are business losses. Most blob 'disasters' are logical (wrong delete, bad code), which operational backup reverses instantly.

When to use

Operational backup: the default for accidental deletion/corruption protection (fast, cheap, local). Vaulted backup: compliance demands isolated copies, or protection against the whole account being compromised/deleted.

Architect's lens

Layer the features: soft delete (blob + container level), blob versioning (every overwrite keeps the old version), point-in-time restore (roll a container back to a timestamp). Combine with immutable storage (WORM legal hold) when regulators require unchangeable data.

10.4 Azure Files Backup & Recovery ▾

🪔 Imagine this

The office shared drive gets photographed each night — and any employee can themselves walk to the archive and pull yesterday's version of one file, without raising an IT ticket.

What is it?

Azure Backup protects file shares using share snapshots — point-in-time, incremental copies of the entire share. Managed by vault policies (schedule + retention up to 10 years); restore a whole share or a single file/folder. Vaulted backup option copies data into the vault for true isolation.

Why it exists

Shared drives suffer constant human error — overwritten spreadsheets, deleted folders. Snapshot-based protection makes recovery a minutes-long self-service task.

When to use

Every Azure Files share used by humans or lift-and-shift apps. Snapshot-only for convenience recovery; vaulted when the copy must survive account-level disasters.

Architect's lens

Useful design detail: end users on Windows can restore via 'Previous Versions' directly from the snapshot — zero IT involvement. Soft delete on file shares protects against the share itself being deleted.

10.5 Azure VM Backup & Recovery ▾

🪔 Imagine this

Photographing the entire flat — furniture, wiring, contents — so you can rebuild an identical flat anywhere, or just retrieve one document from the photo without rebuilding anything.

What is it?

Azure Backup takes application-consistent snapshots of entire VMs (all disks) on schedule into a Recovery Services Vault. Restore options: create a new VM, restore disks only, replace existing disks, or file-level recovery (mount the backup and copy individual files).

Why it exists

VMs hold OS + app + config + data together. A corrupted VM rebuilt manually takes days; restored from backup, it takes minutes to hours.

When to use

Every production VM. Frequency/retention by criticality; combine with ASR when the requirement is regional DR, not just data protection.

Architect's lens

Exam details: Instant Restore keeps snapshots locally for 1–5 days (very fast restores), application-consistent backups via VSS on Windows (no shutdown needed), cross-region restore from GRS vaults, and Enhanced policy for multiple backups per day (tighter RPO).

10.6 Azure SQL Backup & Recovery ▾

🪔 Imagine this

A bank ledger photocopied automatically every few minutes — you can reopen the books exactly as they were at 2:47 PM last Tuesday (point-in-time), and yearly closing ledgers are preserved for a decade for the auditors (long-term retention).

What is it?

Azure SQL Database backups are automatic and built-in: full weekly + differential (12/24h) + transaction log every 5–10 minutes. Point-in-Time Restore (PITR) to any second within 1–35 days retention. Long-Term Retention (LTR) keeps weekly/monthly/yearly fulls for up to 10 years. Backup storage redundancy is configurable (LRS/ZRS/GRS) — GRS enables geo-restore.

Why it exists

Database data is the most unforgiving — a bad UPDATE at 2:47 PM must be reversible to 2:46 PM. The 5–10 minute log backups are what make RPO that tight.

When to use

PITR: the everyday answer to corruption/accidental changes. LTR: compliance ('keep yearly backups 7 years'). Geo-restore: budget DR (restore in another region from geo-replicated backups — RTO hours).

Architect's lens

The exam distinction: Geo-restore (from backups — cheap, RTO in hours, RPO up to 1h) vs Failover Groups (live replica — RTO minutes, RPO seconds, costs a running secondary). The requirement's RTO/RPO numbers tell you which one they want.

10.7 Azure Site Recovery (ASR) ▾

🪔 Imagine this

A fully furnished standby office in another city, continuously mirrored — desks, files, phone lines. Disaster strikes the main office, staff walk into the standby and resume within minutes. And twice a year you rehearse the move without disturbing real work (test failover).

What is it?

Azure's Disaster Recovery service: continuously REPLICATES running VMs (Azure region→region, VMware/Hyper-V/physical→Azure) to a secondary location. On disaster: failover (boot replicas in the recovery region), later failback. Recovery Plans script the ordered, one-click failover of entire applications.

Why it exists

Backup answers 'data lost'; it does NOT answer 'region down, business must keep RUNNING'. ASR exists for continuity — minutes of downtime instead of days of rebuilding.

When to use

Regional DR requirements, DR for on-prem servers (Azure as the DR site — no second datacenter needed), and datacenter migrations (replicate, then 'fail over' permanently).

Architect's lens

The decision rule to teach: Backup = RPO in hours, RTO in hours, protects DATA (and history). ASR = RPO in seconds–minutes, RTO in minutes, protects the RUNNING WORKLOAD. Mature designs use BOTH. Bonus: test failovers run in an isolated network — DR drills with zero production impact.

Section 11

Migration — Moving to Azure the Right Way

Migration is house-shifting at enterprise scale: decide WHY you're moving (strategy), survey everything you own (assess), choose transport for each item — courier the documents online, send the heavy furniture by truck (online vs offline tools) — and unpack into a house that's already wired and secured (landing zone).

11.1 Cloud Adoption Framework (CAF) ▾

🪔 Imagine this

A complete house-shifting playbook written by people who've moved a thousand families: why move, what to pack, prepare the new house, move in phases, set house rules, maintain it well.

What is it?

Microsoft's end-to-end methodology for cloud adoption, in stages: Strategy (business justification) → Plan (digital estate, skills, adoption plan) → Ready (landing zones) → Adopt (Migrate / Innovate) → and the continuous disciplines: Govern and Manage, with Secure across everything.

Why it exists

Most failed cloud projects fail on process, not technology — no business case, no governance, lift-everything-blindly. CAF is the tested path around those failures.

When to use

Any organization's cloud journey — and the vocabulary AZ-305 expects you to use when a scenario spans strategy-to-operations.

Architect's lens

Map scenario language to stages: 'build the business case' → Strategy. 'inventory and prioritize workloads' → Plan. 'prepare subscriptions/networking/identity' → Ready (Landing Zones — Section 1 connects here!). 'move workloads' → Adopt-Migrate. 'enforce standards' → Govern. 'monitor and operate' → Manage.

11.2 The Azure Migration Framework (Migrate Stages) ▾

🪔 Imagine this

Moving day discipline: survey every room and label boxes (Assess), transport in planned trips — fragile items first (Migrate), arrange furniture properly in the new home, return the extra truck (Optimize), then set up locks and smoke alarms (Secure & Manage).

What is it?

The four-stage execution loop inside CAF's Adopt phase: Assess (discover servers/apps/databases, map dependencies, estimate costs) → Migrate (move in waves) → Optimize (right-size, reserved instances, remove waste) → Secure & Manage (backup, monitoring, security baseline).

Why it exists

Migrating 300 servers in one big-bang weekend is how outages happen. Waves, dependency awareness, and post-move optimization make it boring — which is the goal.

When to use

Every migration project. Wave planning: start with low-risk, low-dependency workloads; save the tangled core systems for later waves.

Architect's lens

Know the 5 Rs of rationalization (decided per workload during Assess): Rehost (lift-and-shift to VMs — fastest), Refactor (minor changes, e.g., to App Service/containers), Rearchitect (redesign for cloud), Rebuild (rewrite cloud-native), Replace (drop it for SaaS). Exam scenarios hint which R: 'minimal changes, tight deadline' → Rehost.

11.3 Assess Your Workloads (Azure Migrate) ▾

🪔 Imagine this

The surveyor's visit before shifting: photographing every room, noting which appliance connects to which socket, and estimating the new home's rent and electricity bill before signing anything.

What is it?

Azure Migrate is the central hub for discovery and assessment: deploy an appliance that discovers on-prem servers (VMware, Hyper-V, physical), collects performance data, maps dependencies (which server talks to which — agentless), checks Azure readiness, recommends VM sizes, and estimates monthly cost.

Why it exists

You can't migrate what you don't understand. Dependency mapping prevents the classic disaster: moving an app while its database stays behind, connected by a now-slow, now-broken link.

When to use

Always the first technical step. Run discovery for 2–4 weeks so performance-based sizing reflects real peaks, not one quiet afternoon.

Architect's lens

Assessment outputs that drive design: readiness (ready / conditionally ready / not ready), right-sized SKU recommendations (performance-based beats as-is — often 30–40% cheaper), dependency groups (these servers move TOGETHER as one wave), and TCO comparison for the business case.

11.4 Compare Migration Tools ▾

🪔 Imagine this

Different movers for different items: the document courier (database tools), the van service for regular luggage (server replication), and the sealed container truck for the entire godown (Data Box).

What is it?

The toolbox, by target: Servers → Azure Migrate: Server Migration (agentless/agent-based replication, test migrations). Databases → DMA (Data Migration Assistant: assess compatibility) + DMS (Database Migration Service: execute the move). Storage/files → AzCopy, Azure Storage Mover, Azure File Sync (online) or the Data Box family (offline). Web apps → App Service Migration Assistant.

Why it exists

One tool doesn't fit all — a 500 GB database, 300 VMs, and 60 TB of file shares each need a different vehicle. Exam questions are exactly this matching game.

When to use

Choose by: what's moving, how big, how much bandwidth exists, and how much downtime is allowed.

Architect's lens

Matching cheat-sheet: 'Will my SQL Server work in Azure?' → DMA (assessment). 'Move it with minimal downtime' → DMS online. 'Move 300 VMware VMs' → Azure Migrate Server Migration. 'Move 60 TB over a slow link' → Data Box. 'Continuously sync file server to Azure Files' → File Sync.

11.5 Migrate Your Databases ▾

🪔 Imagine this

Shifting the accounts department: first an expert checks whether the old ledger format works in the new office (DMA), then the actual move — either over a long weekend with books closed (offline) or by keeping a live synced copy and switching one quiet night with five minutes of pause (online).

What is it?

The database journey: DMA assesses compatibility (deprecated features, breaking changes) → choose the target (Azure SQL Database / SQL Managed Instance / SQL on VM) → DMS executes: offline mode (downtime during the move) or online mode (continuous sync, brief cutover).

Why it exists

Databases are the riskiest, most downtime-sensitive migration component. Assessment-first avoids discovering an incompatibility AFTER the weekend cutover starts.

When to use

Offline DMS: dev/test, small DBs, generous maintenance windows. Online DMS: production systems where minutes — not hours — of downtime is the limit.

Architect's lens

Target selection is half the exam answer: SQL on Azure VM (full control, legacy features — Rehost), SQL Managed Instance (near-100% SQL Server compatibility: SQL Agent, cross-DB queries — the sweet spot for lift-and-shift), Azure SQL Database (PaaS-first for modern single apps). 'Uses SQL Agent jobs + cross-database queries, minimal changes' → Managed Instance.

11.6 Online Storage Migration Tools ▾

🪔 Imagine this

Sending your files over the internet courier: instant pickup for small parcels (AzCopy), a managed relocation service that moves entire godowns shelf-by-shelf while you watch a dashboard (Storage Mover), and a magic cupboard that keeps the old office and new office in sync during the transition (File Sync).

What is it?

Network-based transfer options: AzCopy (command-line bulk copy to Blob/Files — scriptable, restartable), Azure Storage Mover (managed service migrating on-prem NFS/SMB shares to Azure at scale, with central tracking), Azure File Sync (sync + tiering between Windows file servers and Azure Files — also a gradual migration path), and Azure Data Factory (when data needs transformation en route).

Why it exists

When bandwidth is sufficient, online migration is simpler, continuous, and has no hardware logistics — data starts arriving today.

When to use

Rule of thumb: data size ÷ available bandwidth = days. If a 10 TB share over your link takes 3 days — go online. If 60 TB takes 3 months — go offline (next card).

Architect's lens

Scenario mapping: one-time bulk blob upload → AzCopy. Many on-prem shares, managed + monitored → Storage Mover. Keep on-prem server as cache during gradual move → File Sync. Migrate + reshape data → Data Factory.

11.7 Offline Storage Migration Tools (Data Box Family) ▾

🪔 Imagine this

When the internet courier would take months, Microsoft ships you an armored, tamper-proof container truck: load your data locally at full speed, the truck drives to the Azure datacenter, and they unload it for you.

What is it?

Physical transfer appliances: Data Box Disk (up to 8 SSDs, ~35–40 TB total — small batches), Data Box (~80–100 TB usable rugged appliance), Data Box Heavy (~800 TB–1 PB, on wheels). All AES-encrypted; data is uploaded into your storage account, then devices are wiped to NIST standards.

Why it exists

Physics: 100 TB over a 100 Mbps line ≈ 100+ days. A truck is genuinely faster than the network at these scales — 'never underestimate the bandwidth of a truck full of disks.'

When to use

Tens of TBs and beyond, limited/expensive bandwidth, remote sites, or deadline-driven datacenter exits. Also works in reverse (export FROM Azure).

Architect's lens

Exam selection by size: a few TB→40 TB → Data Box Disk. ~50–100 TB → Data Box. Hundreds of TB→1 PB → Data Box Heavy. Keyword 'limited bandwidth' or 'no internet connectivity' in a scenario = Data Box family, regardless of size.

⚖️ Comparison Hub

Which One Do I Pick? — Side-by-Side Decision Tables

Azure's hardest part isn't learning services — it's choosing between similar-sounding ones. Each table below is a decision tool: find your scenario's keywords in the right column, and the left column is your answer.

Compute: Where Should My Code Run?

Service	You Manage	Best For	Avoid When
Virtual Machines	OS + everything above	Legacy apps, full control, lift-and-shift	Team wants zero infra management
Azure Batch	Job definitions only	Massive parallel jobs (rendering, simulations)	Long-running services / web apps
App Service	Just your code	Web apps & APIs — the default choice	Non-HTTP workloads, OS-level needs
Container Instances	Container image	Quick single containers, burst tasks	Need orchestration, scaling, LB
AKS	Apps + cluster config	Microservices at scale, DevOps-mature teams	Small team, simple app (overkill)
Functions	Just functions	Event-driven, bursty, pay-per-run	Long-running, steady heavy load
Logic Apps	Visual workflow	Integration & business process automation	Complex custom logic (use Functions)

💡 Exam cue: 'minimize management/operational overhead' → the most managed option that fits (usually App Service or Functions).

Messaging & Events: Service Bus vs Queues vs Event Grid vs Event Hubs

Service	Carries	Superpower	Pick When You Hear
Service Bus	Messages (valuable data)	Ordering, transactions, dead-letter, topics	'orders', 'guaranteed', 'FIFO', 'enterprise'
Storage Queues	Messages (simple)	Dirt cheap, >80 GB queues	'simple', 'cost-effective', 'basic queue'
Event Grid	Discrete events (notifications)	Instant push routing + filtering, pay-per-event	'react when X happens', 'serverless glue'
Event Hubs	Event streams (telemetry)	Millions of events/sec, replayable stream	'telemetry', 'IoT', 'streaming', 'ingest'

💡 Mental model: Service Bus = registered post. Event Grid = doorbell. Event Hubs = airport conveyor belt.

Load Balancing: The Famous 2×2 Matrix

Service	OSI Layer	Scope	Pick When
Azure Load Balancer	L4 (TCP/UDP)	Regional	Non-HTTP traffic within a region
Application Gateway	L7 (HTTP)	Regional	Path-based routing, SSL offload, WAF — regional web apps
Azure Front Door	L7 (HTTP)	Global	Global web apps: edge routing + CDN + WAF
Traffic Manager	DNS	Global	Global routing for ANY protocol, region failover

💡 Two questions solve every load balancing question: HTTP or not? Regional or global? The matrix gives the answer.

Storage Types: Blob vs Files vs Disks vs Data Lake

Service	Access Style	Best For	Not For
Blob Storage	HTTPS / REST / SDK	App files, media, backups, any objects	SMB drive mapping, VM disks
Azure Files	SMB / NFS (mapped drive)	Shared drives, lift-and-shift file shares	Massive analytics data
Managed Disks	Attached to a VM	VM OS & data volumes	Sharing across many clients
Data Lake (ADLS Gen2)	Blob + hierarchical namespace	Big-data analytics storage	Simple app file storage

💡 'Mapped drive / SMB / lift-and-shift share' → Files. 'Analytics on petabytes' → Data Lake. 'VM volume' → Disk. Everything else → Blob.

Redundancy: What Disaster Does Each Survive?

Option	Copies & Location	Survives	Doesn't Survive
LRS	3× in one datacenter	Disk/rack failure	Datacenter or region loss
ZRS	3× across 3 zones, one region	Entire datacenter (zone) loss	Regional disaster
GRS	LRS + 3× in paired region	Regional disaster	— (secondary readable only after failover)
RA-GRS	GRS + readable secondary	Regional disaster + read access anytime	—
GZRS / RA-GZRS	ZRS + paired region copies	Zone AND regional disasters (max protection)	—

💡 Match the requirement's disaster: 'datacenter failure' → ZRS. 'regional outage' → GRS+. 'must READ during outage' → RA-GRS/RA-GZRS.

Databases: Azure SQL vs Cosmos DB

Aspect	Azure SQL Database	Cosmos DB
Data model	Relational (tables, joins, ACID)	NoSQL multi-model (JSON docs, key-value, graph)
Schema	Fixed, enforced	Flexible, evolves freely
Global writes	Single write region (readable replicas)	Multi-region writes, active-active
Latency promise	No fixed SLA on latency	<10 ms reads/writes (SLA-backed)
Pick when	Transactions, reporting, relational integrity	Global scale, low latency everywhere, flexible schema

💡 'Financial transactions / complex joins / reporting' → SQL. 'Globally distributed / millisecond / flexible schema / massive scale' → Cosmos.

Identity: B2B vs B2C · Managed Identity vs Service Principal

Option	Identity Belongs To	Use Case	Key Trait
Entra B2B	Partner's own organization	Partners/vendors collaborating in YOUR tenant	Guest users, their org keeps the credentials
Azure AD B2C	The customer (social/email)	Consumer apps: sign-up/sign-in at scale	Separate tenant, branded journeys, millions of users
Managed Identity	The Azure resource itself	Azure resource → Azure service auth	NO secrets to store or rotate — always prefer it
Service Principal	An app registration	CI/CD, external apps, automation	Has a secret/cert you must protect & rotate

💡 'Partner organization' → B2B. 'Customers/consumers' → B2C. 'Runs in Azure, needs to call Azure' → Managed Identity. 'Runs OUTSIDE Azure' → Service Principal.

Hybrid Connectivity: VPN vs ExpressRoute

Aspect	Site-to-Site VPN	ExpressRoute
Path	Encrypted tunnel over public internet	Private dedicated circuit (no internet)
Bandwidth	Up to ~10 Gbps aggregate	50 Mbps – 100 Gbps
Latency	Variable (internet-dependent)	Predictable, deterministic
Setup	Hours, low cost	Weeks (provider involved), higher cost
Pick when	Quick start, dev/test, backup path	Production hybrid, compliance, performance SLAs

💡 Gold-standard design: ExpressRoute primary + S2S VPN as failover. 'Traffic must not traverse the public internet' → ExpressRoute, always.

Protection: Backup vs Site Recovery

Aspect	Azure Backup	Azure Site Recovery
Protects against	Deletion, corruption, ransomware	Region/site outage — business continuity
What you get back	DATA, from a point in time	RUNNING workloads, in another region
RPO	Hours (schedule-based)	Seconds–minutes (continuous replication)
RTO	Hours (restore time)	Minutes (failover)
Cost model	Storage of recovery points	Replication + standby infrastructure

💡 'Recover deleted/corrupted data' → Backup. 'Keep running if the region fails / RTO in minutes' → ASR. Mature designs need BOTH.

Migration Tools: Matching Tool to Job

Job	Tool	Mode	Why
Discover + assess servers	Azure Migrate	Appliance-based	Dependency maps, sizing, cost estimates
Check DB compatibility	Data Migration Assistant (DMA)	Assessment	Finds breaking changes before you move
Move databases	Database Migration Service (DMS)	Online / Offline	Online = minimal downtime cutover
Bulk file copy (good bandwidth)	AzCopy	Online	Scriptable CLI, restartable
Managed share migration	Azure Storage Mover	Online	Central tracking of many NFS/SMB shares
Gradual file server move	Azure File Sync	Online (sync)	On-prem stays as cache during transition
10s of TB, weak bandwidth	Data Box Disk / Data Box / Heavy	Offline	~40 TB / ~100 TB / ~1 PB by truck

💡 Decide online vs offline first: data size ÷ bandwidth = transfer days. Months over the wire? → Data Box. 'No/limited connectivity' → always offline.

Monitoring: Metrics vs Log Analytics vs Data Explorer

Tool	Data Shape	Best At	Pick When
Azure Monitor Metrics	Numeric time-series	Near-real-time charts & fast alerts	CPU %, request count, autoscale triggers
Log Analytics	Structured logs (KQL)	Correlation & root-cause across resources	Monitoring YOUR Azure estate & apps
Azure Data Explorer	Massive telemetry (KQL)	Billions of rows, interactive speed	YOUR PRODUCT's custom telemetry platform

💡 Same KQL everywhere. Azure resource monitoring → Log Analytics. Building your own telemetry analytics product → ADX.

🎯 Exam Prep

Enterprise Use-Cases — Thinking Like the Exam

AZ-305 doesn't test definitions — it gives you a company, requirements, and constraints, then asks you to choose. Below are three full enterprises. For each: read the requirements FIRST, try answering module by module, then compare with the architect's decisions. The 'why' column is the exam mindset.

Retail / E-Commerce

Case 1 · GlobalKart — Indian E-Commerce Going Global

▾

GlobalKart, a successful Indian online marketplace, is expanding to Europe and the US. Festival sales bring 20× traffic spikes. The board demands 99.99% availability, fast page loads on three continents, and customer sign-in with Google/Facebook. Orders are sacred — none may ever be lost. The dev team is small and wants minimum infrastructure management.

📋 Requirements (read these like an examiner)

99.99% availability for the storefront
Low latency for users in India, EU, and US
Survive 20× festival traffic spikes automatically
Customers self-register with social accounts
Zero order loss, even during failures
Small team — minimize operational overhead

Module	Architect's Decision	The Exam Mindset — WHY
Governance	Management groups (Prod / Non-Prod), subscription per environment, Azure Policy enforcing allowed regions + mandatory CostCenter tags	Multi-region expansion without governance = untraceable costs and accidental deployments in wrong regions. Structure first.
Compute	App Service (Premium) in 3 regions with autoscale rules	'Small team + minimize overhead' kills VMs and AKS. App Service autoscale absorbs 20× spikes; Premium gives zone redundancy and VNet integration.
Delivery & Protection	Azure Front Door with WAF + edge caching in front of all regions	Global + HTTP → Front Door (the 2×2 matrix). One global entry: nearest-region routing, static content cached at edge, OWASP attacks blocked before reaching origin.
Databases	Azure SQL with Failover Groups (orders) + Cosmos DB multi-region (catalog, cart) + Redis (sessions)	Orders need ACID + 99.99% → SQL Business Critical with failover groups. Catalog/cart need global millisecond reads + flexible schema → Cosmos. Sessions are temporary hot data → Redis. Polyglot persistence in action.
Messaging	Service Bus queues for order processing; Event Grid for reactive automation; Event Hubs for clickstream	'Zero order loss' → Service Bus (guaranteed delivery, dead-lettering). Doorbell-style reactions → Event Grid. High-volume behavioral telemetry → Event Hubs. The trio, each in its lane.
Identity	Azure AD B2C for customers; Managed Identities + Key Vault for all app-to-service auth	'Customers + social sign-in' → B2C (never B2B). Apps authenticate with Managed Identity — zero credentials in code.
Monitoring	Application Insights (availability tests from 3 continents) + central Log Analytics + autoscale alerts	99.99% must be MEASURED. Availability tests prove the SLA from the user's side; App Map finds the failing component during festival chaos.
Backup	SQL PITR + LTR, GZRS storage redundancy, Cosmos continuous backup	A bad deployment that corrupts data needs point-in-time rewind. GZRS = survives zone AND regional disasters.

🧠 Mindset Takeaway

Pattern to internalize: requirements are filters. 'Small team' eliminated half the compute options instantly. 'Zero order loss' chose the message broker. 'Global + social customers' chose B2C. Every AZ-305 answer is hiding inside a requirement sentence.

Banking / Financial Services (Regulated)

Case 2 · SecureBank — Regulated Core Banking on Azure

▾

SecureBank is moving its loan-processing platform to Azure. Regulators mandate: no service may be reachable from the public internet, all data stays in India, connectivity from branches must never traverse the public internet, and even database administrators must not be able to read customers' card numbers. DR requirement: RPO 5 minutes, RTO 1 hour. Every administrative action must be audited, and admin rights must not be permanent.

📋 Requirements (read these like an examiner)

Zero public endpoints on any service
Branch connectivity with deterministic latency, never over public internet
Data residency: India regions only
DBAs must not see cardholder data
DR: RPO ≤ 5 minutes, RTO ≤ 1 hour
No standing admin privileges; full audit trail

Module	Architect's Decision	The Exam Mindset — WHY
Networking	Hub-spoke VNets; ExpressRoute primary + S2S VPN failover; Azure Firewall in hub; Private Endpoints + Private DNS for ALL PaaS; Bastion for admin access	'Never over public internet' → ExpressRoute (VPN still rides the internet — only acceptable as backup). 'Zero public endpoints' → Private Endpoints everywhere + Bastion instead of public RDP. Hub firewall inspects everything.
Governance	Azure Policy: deny public IPs, deny non-India regions, require Private Endpoints; initiative assigned at management group	Compliance must be ENFORCED, not requested. Policy 'Deny' effect makes non-compliant resources impossible to create — auditors love 'cannot' more than 'should not'.
Data Security	Azure SQL Business Critical (zone-redundant) + Always Encrypted on PAN columns + TDE with customer-managed keys in Key Vault (HSM)	TDE protects at rest, TLS in transit — but 'DBA must not see card numbers' is the IN USE state → Always Encrypted (keys live with the app, never the DB). Regulators requiring key ownership → CMK in HSM-backed Key Vault.
DR	SQL Failover Group to second India region (Central India ↔ South India) + ASR for application VMs + Recovery Plan; quarterly test failovers	RPO 5 min kills backup-based DR (geo-restore RPO ≈ 1 h). Continuous replication only: Failover Groups (RPO seconds) + ASR (RPO minutes, RTO minutes). Residency keeps both regions in India.
Identity	Conditional Access (MFA + compliant device for admins), PIM for just-in-time role activation, quarterly Access Reviews, Identity Protection risk policies	'No standing privileges' is literally PIM's definition — roles activated just-in-time with approval + audit. Access Reviews prove periodic recertification to auditors.
Monitoring	Central Log Analytics (731-day interactive + archive), Activity Log + Entra audit logs captured, diagnostic settings on everything, stream to SIEM via Event Hubs	'Every action audited' → capture the control-plane (Activity Log) AND identity-plane (Entra logs) AND data-plane (resource diagnostics). Event Hubs export feeds the bank's existing SIEM.
Backup	Azure Backup with immutable vault + Multi-User Authorization + soft delete; SQL LTR 7 years	Banks are ransomware's favorite target — immutability means even a compromised admin can't delete recovery points. LTR satisfies the regulator's retention mandate.

🧠 Mindset Takeaway

Regulated-industry reflex: every requirement maps to a SPECIFIC control — 'not public internet' = ExpressRoute, 'no public endpoint' = Private Endpoint, 'DBA can't read' = Always Encrypted, 'no standing access' = PIM. Vague answers ('use security best practices') score zero; named controls score full marks.

Manufacturing

Case 3 · MegaFab — Datacenter Exit + IoT Modernization

▾

MegaFab's datacenter lease ends in 9 months: 300 VMware VMs, a 60 TB file server, and SQL Server 2014 (heavy SQL Agent jobs + cross-database queries) must move to Azure. The factory's internet link is only 200 Mbps. Leadership also wants a new IoT capability: 2,000 machines streaming sensor data, with real-time overheating alerts and 5 years of historical analysis. Factory floors sometimes lose connectivity for hours.

📋 Requirements (read these like an examiner)

Vacate datacenter within 9 months (hard deadline)
Minimal changes to existing apps (no time to rewrite)
60 TB file data over a 200 Mbps link
SQL 2014 with Agent jobs + cross-DB queries
Real-time alerts + 5-year history for 2,000 machines
Factory must operate during connectivity loss

Module	Architect's Decision	The Exam Mindset — WHY
Strategy (CAF)	CAF: Strategy → Plan → Ready (deploy Landing Zone FIRST) → Adopt-Migrate in waves; rationalization = mostly Rehost	'9 months + minimal changes' → Rehost (lift-and-shift) for almost everything; modernize later. Landing zone before the first VM moves — governance retrofitted onto 300 VMs is pain.
Assessment	Azure Migrate appliance: 4 weeks of discovery, performance-based sizing, dependency mapping → migration waves	Dependency maps decide the waves — app servers move WITH their databases. Performance-based sizing typically cuts the compute bill 30–40% vs as-is sizing.
Server Migration	Azure Migrate: Server Migration — agentless VMware replication, test migrations per wave, weekend cutovers	Built for exactly this: replicate in background, TEST in an isolated VNet (no production impact), cut over wave by wave. Low-risk apps first, the tangled core last.
Database	DMA assessment → Azure SQL Managed Instance via DMS online migration	SQL Agent jobs + cross-DB queries → Managed Instance (near-100% compatibility); Azure SQL Database would break both. DMS online keeps downtime to a brief cutover. 2014's end-of-support adds urgency.
File Data (60 TB)	Data Box (offline) for the 60 TB bulk → Azure Files; File Sync afterwards for the delta + branch cache	Maths first: 60 TB over 200 Mbps ≈ 30+ days of saturated link — unacceptable. Data Box moves the bulk by truck in days; File Sync handles changes made during transit and keeps a local cache for the factory.
IoT Pipeline	Machines → IoT Hub/Event Hubs → Stream Analytics (tumbling-window alerts) → Azure Data Explorer (5-year history); Power BI dashboards	Firehose ingestion → Event Hubs. 'Alert when temperature crosses X' → Stream Analytics windowed queries in real time. 'Billions of readings, 5 years, interactive analysis' → ADX, purpose-built time-series engine.
Edge Resilience	Azure SQL Edge on factory-floor gateways: local capture + processing, sync to cloud when link restores	'Operates during connectivity loss' → process at the edge. SQL Edge stores and analyzes locally, then syncs — the factory never depends on the WAN to run.
Protect (Day 1)	Azure Backup on all migrated VMs + SQL MI PITR/LTR; ASR region-to-region for the critical production line systems	Migration's Secure & Manage stage is part of migration, not an afterthought. The old datacenter's tape backups are gone — protection must exist before the trucks leave.

🧠 Mindset Takeaway

Migration questions are arithmetic + matching: bandwidth math chooses online vs offline; compatibility features (SQL Agent, cross-DB) choose the database target; 'minimal changes + deadline' chooses Rehost. And modern exams love hybrid twists — the edge requirement (SQL Edge) is how AZ-305 checks you read EVERY requirement.

📚 Final Section

Summary & Reference Library

The handbook in one breath: Govern first, then choose compute by management appetite, storage by data shape, databases by consistency needs, integration by data journey, messaging by message value, identity as the new perimeter, monitoring as your eyes, networks by requirements, backup by RPO/RTO, and migration by math. Below — the official documentation for every module.

Governance

Hierarchy (MG → Subscription → RG) decides inheritance. Policy controls WHAT, RBAC controls WHO. Landing Zones make it repeatable.

Compute

Choose by management appetite: full control (VM) → managed web (App Service) → event-driven (Functions) → orchestrated containers (AKS) → visual workflows (Logic Apps).

Storage

Match storage to data shape: objects→Blob, shares→Files, VM volumes→Disks. Redundancy = which disaster you must survive.

Databases

Relational integrity → SQL. Global millisecond scale → Cosmos. Protect data at rest (TDE), in transit (TLS), in use (Always Encrypted).

Data Integration

The journey: ADF collects → Data Lake stores raw → Databricks transforms → Synapse analyzes → Stream Analytics watches live.

App Architecture

Message = valuable parcel (Service Bus). Event = doorbell (Event Grid). Stream = conveyor (Event Hubs). APIM is the one front door; cache what you read often; automate every deployment.

Identity

Identity is the new perimeter. B2B = partners, B2C = customers. Managed Identity > Service Principal. Conditional Access + PIM + Access Reviews = Zero Trust in practice.

Monitoring

Metrics for speed, Logs for depth, KQL everywhere. App Insights for apps, Workbooks for stories, ADX for your own telemetry at scale.

Networking

Hub-spoke is home. ExpressRoute when internet won't do. The 2×2 matrix (HTTP? Global?) answers every load-balancing question. Defense in depth: DDoS → WAF → Firewall → NSG → Private Endpoints.

Backup & Recovery

RPO/RPO are the answer key. Backup = recover DATA (hours). ASR = keep RUNNING (minutes). Immutable vaults beat ransomware.

Migration

CAF stages give the vocabulary. Assess before you move; dependency maps make the waves. Bandwidth math picks online vs offline; compatibility features pick the database target.

Official Documentation & Resources

Exam & Learning Paths

AZ-305 Exam Page (skills measured) ↗AZ-305 Official Course (AZ-305T00) ↗Azure Architecture Center ↗Cloud Adoption Framework (CAF) ↗Well-Architected Framework ↗

Governance & Identity

Management Groups ↗Azure Policy ↗Azure RBAC ↗Azure Landing Zones ↗Microsoft Entra ID ↗Entra B2B ↗Azure AD B2C ↗Conditional Access ↗Azure Key Vault ↗

Compute & App Architecture

Virtual Machines ↗App Service ↗Azure Functions ↗AKS ↗Logic Apps ↗Service Bus ↗Event Grid ↗Event Hubs ↗API Management ↗Compute decision tree (must-read!) ↗

Storage, Databases & Analytics

Storage Accounts ↗Storage Redundancy ↗Azure SQL Database ↗Cosmos DB ↗Data Factory ↗Synapse Analytics ↗Databricks ↗Stream Analytics ↗

Networking & Security

Virtual Network ↗ExpressRoute ↗VPN Gateway ↗Front Door ↗Application Gateway ↗Azure Firewall ↗Private Link / Endpoints ↗Load-balancing decision guide ↗

BCDR, Monitoring & Migration

Azure Backup ↗Site Recovery ↗Azure Monitor ↗Log Analytics / KQL ↗Azure Data Explorer ↗Azure Migrate ↗Database Migration Service ↗Data Box ↗

Trainer's Picks — RR Skillverse

RR Skillverse — courses, blogs & ExamPrep ↗YouTube @rrskillverse ↗➕ Add your own links here (edit this section) ↗