Imagine a big housing society. Before anyone builds a single flat, someone decides the rules: who owns which plot, how bills are split, what colors are allowed, who can enter which building. Azure governance is exactly that — the rules and structure that keep hundreds of cloud resources from becoming chaos.
The management committee of a housing society — they don't build flats, they make sure everything built follows the rules.
Governance is the set of structures (management groups, subscriptions, resource groups) and rules (policies, RBAC, tags) that control HOW your organization uses Azure.
Without governance, every team creates resources anywhere, bills become untraceable, security gaps appear, and nobody knows who owns what. Governance prevents the chaos before it starts.
From day one. Governance designed after 500 resources exist is 10x harder than governance designed before the first resource.
As an architect, governance is usually your FIRST design decision in any AZ-305 scenario — the hierarchy you choose decides how policies, billing, and access flow down to everything else.
The head office of a company with many branches. A rule made at head office automatically applies to every branch below it.
Containers that sit ABOVE subscriptions. You group multiple subscriptions under a management group and apply policies/access at that level — everything below inherits automatically.
Large organizations have many subscriptions (Finance, HR, Dev, Prod). Applying the same security policy to each one manually is error-prone. Set it once at the management group, and all subscriptions inherit it.
When you have more than a handful of subscriptions, or need to enforce company-wide rules (e.g., 'no resources outside India region') across all of them.
You can nest up to 6 levels deep. A typical design: Root → Departments → Environment (Prod/Non-Prod) → Subscriptions. Policy assigned at top flows to the bottom.
Separate electricity meters for each flat in a building. Each meter gets its own bill, its own limit, and tripping one doesn't affect the others.
A subscription is a billing boundary and an administrative boundary. Every resource you create lives inside exactly one subscription, and that subscription gets the bill.
Organizations need to separate billing (whose budget pays?), limits (Azure quotas apply per subscription), and blast radius (a compromised Dev subscription shouldn't touch Production).
Create separate subscriptions to isolate: environments (Dev/Test/Prod), departments, or large projects with their own budgets.
Architect's rule of thumb: subscriptions are SECURITY and BILLING boundaries. If two workloads need totally different access rules or separate invoices — separate subscriptions.
When you move house, you pack one carton per room — 'Kitchen', 'Bedroom'. Everything for one purpose travels together and can be unpacked (or thrown away) together.
A logical folder inside a subscription that holds related resources — typically everything belonging to one application or workload (its web app, database, storage).
Resources that live and die together should be managed together. Delete the resource group → everything inside is deleted in one shot. Apply access on the group → applies to all resources in it.
One resource group per application per environment is the common pattern (e.g., rg-shop-prod, rg-shop-dev).
Golden rule: resources sharing the SAME LIFECYCLE go in the same group. A resource group can hold resources from different regions, but the group itself has a 'home' region (stores metadata only).
Sticky labels on office equipment: 'Dept: Finance', 'Owner: Priya', 'Project: Diwali-Sale'. The label doesn't change the item — it tells you everything about it at a glance.
Tags are key-value labels you attach to resources, resource groups, or subscriptions — e.g., CostCenter=Marketing, Environment=Prod, Owner=raushan@company.com.
When the monthly Azure bill arrives, finance asks: 'Which department spent this ₹4 lakh?' Without tags, nobody knows. With tags, you slice the bill by department, project, or owner in one click.
Always — and enforce it with Azure Policy ('every resource MUST have a CostCenter tag') so people can't skip labeling.
Standard tag set every architect designs: Environment, CostCenter, Owner, Application, Criticality. Tags power cost reports, automation scripts, and cleanup of orphaned resources.
Building bylaws vs door access cards. Bylaws say WHAT can be built ('no building above 4 floors'). Access cards decide WHO can enter which room. Two different controls, both needed.
Azure Policy controls WHAT can exist and how it must be configured ('only allow VMs in Central India', 'storage must use encryption'). RBAC controls WHO can perform actions ('Priya can read, Amit can manage').
RBAC alone isn't enough — an authorized person can still create a non-compliant resource. Policy alone isn't enough — compliant resources still need access control. Together they cover both axes.
Policy: enforcing standards, compliance, allowed regions/SKUs, mandatory tags. RBAC: granting teams the minimum access needed for their role.
Policy effects to know: Deny (block creation), Audit (allow but flag), Append/Modify (auto-fix), DeployIfNotExists (auto-deploy missing pieces). Group policies into Initiatives for compliance standards like ISO 27001.
A ready-to-move-in apartment: wiring done, plumbing done, security installed, society rules registered. You just bring your furniture (the workload) and start living.
A pre-designed, pre-deployed Azure environment with governance, networking, identity, and security ALREADY configured — following Microsoft's Cloud Adoption Framework best practices.
Most enterprises rebuild the same foundation (hub network, policies, monitoring, identity) for every project — inconsistently. Landing Zones make the foundation repeatable and correct from the start.
Any organization adopting Azure seriously, especially at enterprise scale or before migrating major workloads from on-premises.
A landing zone typically includes: management group hierarchy, platform subscriptions (connectivity, identity, management), hub-spoke networking, baseline policies, and logging — deployed via templates (Bicep/Terraform) so it's identical every time.
All compute services answer one question: 'I have code — where should it run?' The answer depends on how much control you want vs how much management you're willing to do. Think of it as housing options: own a house (VM), rent a serviced apartment (App Service), or pay per night in a hotel (Functions).
Renting an empty flat. The building (hardware) belongs to the landlord, but inside — furniture, cleaning, repairs, security — everything is YOUR responsibility.
A full computer in the cloud — you choose the OS (Windows/Linux), size (CPU/RAM), and disks. You get complete control of everything from the OS upward. This is IaaS (Infrastructure as a Service).
Some applications simply can't run on managed platforms — legacy software, custom OS configurations, special licensing, or anything needing admin/root access.
Lift-and-shift migrations from on-premises, legacy apps, software requiring specific OS setups, or workloads where you need full control.
Architect decisions: VM size family (B-series burstable for dev, D-series general, E-series memory-heavy), disk type (Premium SSD for prod), Availability Zones for uptime, auto-shutdown for dev/test cost savings.
Hiring 1,000 temporary workers for one massive day of work — they arrive, finish the job in parallel, and leave. You don't keep them on payroll.
A service for running large-scale parallel jobs — it automatically creates a pool of VMs, splits your job across them, runs everything, and tears the VMs down when done.
Some jobs (rendering a film, risk simulations, scientific modeling, processing 1 million images) would take weeks on one machine but hours on 500. Managing 500 VMs manually is impossible — Batch automates it.
High-Performance Computing (HPC): media rendering, financial risk modeling, genetic research, large-scale image/data processing — anything 'embarrassingly parallel'.
You define a pool (VM size + count), jobs, and tasks. Batch handles scheduling, retries, and scaling. Use low-priority/spot VMs in the pool for up to 80% cost savings on interruptible work.
A serviced apartment. You bring your luggage (code); cleaning, maintenance, security, and electricity are all handled by the building management.
A fully managed platform (PaaS) for hosting web applications and REST APIs. Microsoft manages the OS, patching, and infrastructure — you just deploy code.
Most teams want to ship features, not babysit servers. App Service gives auto-scaling, custom domains, SSL, CI/CD integration, and staging slots without any infrastructure work.
Web apps, REST APIs, company portals — the default choice for most web workloads, especially with small teams.
Key designs: App Service Plan tier (Basic/Standard/Premium decides features + price), deployment slots (test in staging, swap to production with zero downtime), autoscale rules (scale out on CPU or schedule), VNet integration for private connectivity.
A pop-up food stall. No restaurant lease, no staff hiring — set up the stall in minutes, serve, pack up. Perfect for short, simple gigs.
The fastest, simplest way to run a container in Azure — no cluster, no orchestrator, no VMs to manage. Give it a container image, it runs in seconds, billed per second.
Sometimes you just need ONE container running NOW — a quick task, a build job, a burst worker. Setting up Kubernetes for that is like building a food court for one stall.
Short-lived tasks, dev/test containers, burst capacity for AKS (virtual nodes), simple background processors that don't need orchestration.
Know its limits for the exam: no auto-scaling, no load balancing across instances, no rolling deployments. The moment you need those → you've outgrown ACI, move to AKS or Container Apps.
A food court manager. Dozens of stalls (containers) operate at once — the manager assigns spots, replaces a stall that shuts down, adds counters when the crowd grows, and balances customers across them.
Managed Kubernetes — the industry-standard orchestrator for running many containers: scheduling, self-healing, scaling, rolling updates. Azure manages the control plane for free; you pay for worker nodes.
Microservices architectures run dozens or hundreds of containers. Someone must restart crashed ones, distribute traffic, roll out updates without downtime. That 'someone' is Kubernetes.
Microservices at scale, teams with DevOps maturity, workloads needing fine-grained control over deployment, networking, and scaling.
Architect decisions: node pools (separate pools for system vs workload, GPU pools for ML), cluster autoscaler + horizontal pod autoscaler, ingress controller for routing, Azure CNI vs kubenet networking, integration with ACR (container registry) and Entra ID.
A taxi vs owning a car. You don't pay for the car sitting idle in the garage — you pay only for the rides you actually take.
Serverless compute — small pieces of code that run in response to a trigger (HTTP request, timer, new queue message, new blob uploaded). No servers to manage; scales from zero to thousands automatically.
Many workloads are bursty: nothing for hours, then 10,000 events in a minute. Paying for an always-on server for that is waste. Functions bill per execution — idle costs nothing (Consumption plan).
Event-driven processing (resize image when uploaded), scheduled jobs (nightly cleanup), lightweight APIs, glue code connecting services.
Plans matter for AZ-305: Consumption (true serverless, cold starts), Premium (pre-warmed, VNet access, no cold start), Dedicated (run in App Service Plan). Durable Functions extend it for long-running, stateful workflows (chained steps, fan-out/fan-in).
A super-efficient office clerk following a flowchart: 'WHEN an invoice email arrives → save attachment to the shared drive → notify the finance Teams channel → add a row in the tracker.' No coding — just the flowchart.
A low-code/no-code workflow automation service with 1,000+ ready-made connectors (Outlook, SAP, Salesforce, SQL, Teams...). You design workflows visually: trigger → actions → conditions.
Most business automation is integration plumbing — 'when X happens in system A, do Y in system B'. Writing custom code for every such flow is expensive. Logic Apps makes it drag-and-drop.
Business process automation, system integration (connect SaaS apps + on-prem systems), approval workflows, scheduled data syncs.
Functions vs Logic Apps — the classic exam comparison: Functions = code-first, developer writes logic. Logic Apps = designer-first, visual workflow with connectors. They combine beautifully: Logic App orchestrates the flow, calls a Function for custom logic.
Not all data is equal. A product photo, a shared office drive, and a VM's hard disk are completely different animals — and Azure has a purpose-built home for each. The umbrella over most of them is the Storage Account.
A bank locker facility. One building (the account), but inside there are different locker types — document lockers (blobs), shared family lockers (files), small quick-access boxes (queues, tables).
The top-level container for Azure's core storage services. One storage account can hold: Blobs (objects/files), Files (network shares), Queues (messages), and Tables (simple NoSQL).
It gives one place to manage settings that apply to all data inside: redundancy level, security, networking, and access keys.
You'll create one for almost every workload. Separate accounts when workloads need different redundancy, security, or billing separation.
Key account-level decisions: performance tier (Standard HDD-backed vs Premium SSD-backed), redundancy (next card), access tier defaults, and network rules (public vs private endpoint).
Photocopies of an important certificate: 3 copies in the same cupboard (LRS), copies in 3 different rooms of the house (ZRS), copies also at your relative's house in another city (GRS), and the relative is allowed to show them to you anytime (RA-GRS).
Azure always keeps multiple copies of your data. Redundancy options decide WHERE those copies live: LRS = 3 copies in one datacenter. ZRS = 3 copies across 3 zones in one region. GRS = LRS + 3 more copies in a paired region. RA-GRS = GRS + read access to the secondary copy.
Hardware fails, datacenters flood, regions go down. The question is: which disaster do you need to survive — a disk failure, a datacenter failure, or an entire regional outage?
LRS: dev/test, easily recreatable data. ZRS: production needing zone resilience. GRS/RA-GRS: business-critical data that must survive a regional disaster.
Exam mindset: match redundancy to business impact + budget. GRS costs ~2x LRS. RA-GRS lets apps READ from the secondary region even while primary is healthy — useful for read-heavy global apps.
An unlimited digital godown (warehouse). Throw in anything — photos, videos, backups, logs — pay only for the shelf space you use, and pay less for shelves you rarely visit.
Object storage for unstructured data — any file type, any size, massive scale. Organized as Account → Containers → Blobs. Accessed over HTTPS via URLs, SDKs, or REST API.
Databases are terrible at storing big binary files (slow, expensive). Blob storage is built exactly for this — cheap, durable (11 nines), and infinitely scalable.
User uploads, images/videos, backups, log archives, static website hosting, the raw zone of a data lake.
Access tiers are the key design lever: Hot (frequent access), Cool (30+ days, cheaper storage, costlier reads), Cold (90+ days), Archive (180+ days, cheapest, hours to retrieve). Lifecycle policies move data between tiers automatically — huge cost saver.
The office shared drive (the famous 'Z: drive'). Everyone in the team maps it on their computer and sees the same folders — except now it lives in the cloud instead of a server under someone's desk.
Fully managed file shares in the cloud using SMB/NFS protocols — the same protocols traditional file servers use. Mount it on Windows, Linux, or macOS like a normal network drive.
Thousands of legacy apps and teams depend on shared network drives. Azure Files lets you move them to the cloud WITHOUT changing how the apps work — same drive letter, same paths.
Lift-and-shift of apps using file shares, centralized config files, shared team storage, replacing aging on-prem file servers.
Know Azure File Sync for the exam: keeps an on-prem Windows server as a fast local cache while the full data lives in Azure — branch offices get local speed with cloud capacity.
The SSD/hard drive inside your rented computer. The flat (VM) needs storage inside it — you choose how fast and how big that drive should be.
Managed block storage volumes attached to VMs — the OS disk and data disks. Azure handles the underlying storage infrastructure ('managed disks').
Every VM needs disks, and disk performance often decides application performance. A slow disk under a fast database = a slow database.
Automatically with every VM. The design decision is the TYPE: Standard HDD (dev/test), Standard SSD (light prod), Premium SSD (production workloads), Ultra Disk (extreme IOPS — heavy databases).
Architect levers: disk tier (IOPS/throughput), size, caching settings (ReadOnly cache for data disks of databases), snapshots for backup, and disk encryption (platform-managed vs customer-managed keys).
Locker keys with rules: a guest key that expires Sunday 6 PM and opens only locker #12 (SAS token), CCTV + entry register (logging), and a locker room with no street entrance at all (private endpoint).
The layered controls protecting storage: encryption at rest (on by default), access keys vs Entra ID auth, SAS tokens (time-limited, scoped access links), network rules/firewall, and private endpoints.
Storage accounts hold the crown jewels — backups, customer files, data lakes. Misconfigured public storage is one of the most common cloud breaches worldwide.
Every storage design. The exam loves asking the MOST SECURE option for a scenario.
Security ladder (least → most secure): Account keys (avoid — full access, no expiry) → SAS tokens (scoped + time-limited) → Entra ID + RBAC (identity-based, auditable) → plus Private Endpoints to remove public network exposure entirely. Prefer Managed Identity + RBAC for app access.
Storage holds files; databases hold structured, queryable, transactional data — orders, customers, payments. Here the architecture questions become: how does it scale, how does it survive failures, and how is the data protected at every moment of its life?
A meticulous accountant with a strict ledger. Every entry follows rules, totals always balance, and a transaction either fully happens or doesn't happen at all — never half.
A fully managed relational database (PaaS) based on SQL Server. Microsoft handles patching, backups, high availability. You get tables, relationships, T-SQL, and ACID transactions.
Business-critical data — orders, payments, inventory — needs guaranteed consistency and relationships between tables. Relational databases have been the answer for 40 years, and Azure SQL removes the server-management pain.
Transactional applications: e-commerce orders, banking records, ERP/CRM data — anywhere correctness matters more than raw flexibility.
Deployment options to know: Single Database (one isolated DB), Elastic Pool (many DBs share resources — perfect for SaaS with many customer DBs), Managed Instance (near-100% SQL Server compatibility for migrations). Purchase models: DTU (simple bundle) vs vCore (granular, supports reserved pricing).
A restaurant getting crowded: you can buy a bigger kitchen (scale UP), open more branches (scale OUT), or set up self-service counters just for viewing the menu (read replicas).
Strategies for handling growth: Vertical scaling (bigger tier — more CPU/RAM), Read scale-out (replicas serve read queries), Sharding (split data across multiple databases), Elastic Pools (shared capacity across many DBs), and Hyperscale (storage grows to 100+ TB with fast scaling).
Databases are usually the first bottleneck as apps grow. Choosing the wrong scaling strategy means either overspending or hitting a wall during peak business.
Vertical: quick fix, has a ceiling. Read replicas: read-heavy apps (reports, catalogs). Sharding: multi-tenant or massive datasets. Hyperscale: very large single databases with unpredictable growth.
Exam pattern: 'reports are slowing down the production DB' → route reports to a read replica. 'SaaS with 500 customer databases with different peak times' → Elastic Pool. 'DB will grow beyond 4 TB' → Hyperscale.
A hospital with a backup generator (zone redundancy) AND a sister hospital in another city ready to take patients if the whole city loses power (geo-replication + failover).
Designs that keep the database alive through failures: built-in HA (every Azure SQL tier), Zone redundancy (replicas across availability zones), Active Geo-Replication (readable secondary in another region), and Failover Groups (group of DBs failing over together with one connection string).
Database downtime = business downtime. The design question: which failure must you survive (server? datacenter? region?) and how much downtime/data loss is acceptable (RTO/RPO)?
Zone redundancy: production within one region. Geo-replication/Failover groups: business-critical apps that must survive a full regional outage.
Two numbers drive every availability design: RTO (how fast must we recover) and RPO (how much data can we lose). Failover Groups are the exam favorite — automatic failover, and the listener endpoint means applications don't change connection strings.
Protecting cash at all three moments: locked in the vault (at rest), in an armored van while moving (in transit), and counted behind a privacy screen so even staff nearby can't see (in use).
Three states of data, three protections: At rest → Transparent Data Encryption (TDE) encrypts stored files automatically. In transit → TLS encrypts data moving over the network. In use → Always Encrypted / confidential computing keeps data encrypted even while being processed, so even DBAs can't read it.
Attackers target all three states — stolen disks, intercepted traffic, and insider threats. Compliance frameworks (banking, healthcare) explicitly require protection of all three.
TDE and TLS: always (mostly on by default). Always Encrypted: highly sensitive columns — card numbers, national IDs, medical records — where even administrators must not see plaintext.
Additional layers to mention in designs: Dynamic Data Masking (hide data from non-privileged users in results), Row-Level Security (users see only their rows), Auditing + Microsoft Defender for SQL (threat detection).
A pocket-sized accountant living inside the factory machine itself — recording everything locally even when the internet is down, syncing the books to head office when the line comes back.
A small-footprint version of SQL Server engine (~500 MB) that runs in containers on IoT/edge devices — with built-in data streaming and time-series support.
Factories, ships, and remote sites can't depend on constant connectivity, and sending every sensor reading to the cloud is slow and costly. Processing data locally at the 'edge' solves both.
IoT scenarios needing local storage + processing on the device: manufacturing lines, connected vehicles, retail stores, offshore equipment.
Architecture pattern: SQL Edge processes/filters data on-device → only meaningful aggregates sync to Azure (IoT Hub → cloud database). Same T-SQL skills work at the edge and in the cloud.
A chain of notebooks kept in every city of the world, all magically synced. A customer in Tokyo and one in Berlin each write to their LOCAL notebook — and reads are lightning fast everywhere.
A globally distributed NoSQL database with guaranteed single-digit-millisecond reads/writes. Multi-model: document (JSON), key-value (Table API), graph (Gremlin), MongoDB/Cassandra compatible APIs.
Global apps can't serve the world from one region — physics (latency) forbids it. Cosmos replicates data to any regions you pick, supports multi-region writes, and scales practically without limit.
Global low-latency apps, flexible/evolving schemas, massive scale (IoT, gaming, retail catalogs). Table API: a premium upgrade path for Azure Table Storage apps.
The two design decisions that make or break Cosmos: (1) Partition key — choose a property with high cardinality and even access spread, or you'll create hot partitions. (2) Consistency level — 5 options from Strong (always latest, slower) to Eventual (fastest, may briefly read stale). Session is the practical default. Cost = Request Units (RU/s).
Companies have data scattered across 20 systems. The analytics story is a journey: COLLECT it (Data Factory), STORE it raw (Data Lake), PROCESS it (Databricks), ANALYZE it at scale (Synapse), and watch it LIVE (Stream Analytics). Think of it as a city water system: pipelines, reservoir, treatment plant, distribution HQ, and live quality sensors.
A courier and sorting company for data. Every night at 2 AM it picks up parcels (data) from 15 different offices (systems), sorts and re-labels them (transforms), and delivers them to the warehouse — fully automated, on schedule.
A cloud ETL/ELT orchestration service. You visually build pipelines that copy data from 100+ sources (SQL, SAP, Oracle, files, APIs...), transform it (mapping data flows), and load it into destinations — on schedule or on trigger.
Analytics needs data from everywhere in one place. Hand-writing and maintaining 50 copy scripts is fragile. ADF makes the movement visual, monitored, retryable, and scheduled.
Building data pipelines, nightly data warehouse loads, migrating on-prem data to the cloud, orchestrating multi-step data workflows.
Key concepts: Pipeline (the workflow), Activities (steps), Linked Services (connections), Integration Runtime (the compute that moves data — including Self-Hosted IR to reach on-premises systems behind firewalls — a classic exam point).
A massive reservoir that stores raw water from every source — rivers, rain, borewells — before any treatment. Store everything first; decide how to use it later.
Blob Storage + hierarchical namespace (real folders/directories) + big-data optimizations. The standard place to store enormous amounts of raw and processed data cheaply.
Traditional warehouses force you to structure data BEFORE storing (expensive, slow, and you discard things you might need later). A lake stores everything raw and cheap — structure it when you actually need it.
The central storage layer of any analytics platform; the landing zone for ADF pipelines; the data source for Databricks and Synapse.
Design the lake in zones — Raw (as-received) → Curated/Enriched (cleaned) → Consumption (ready for reports). Hierarchical namespace enables folder-level ACLs and fast renames that analytics engines depend on.
A high-end research laboratory next to the reservoir. Scientists (data engineers/data scientists) take raw water and run serious experiments — purification, analysis, prediction — with industrial-grade equipment.
A managed Apache Spark platform for heavy data engineering, advanced analytics, and machine learning. Collaborative notebooks (Python/SQL/Scala/R) running on auto-scaling Spark clusters.
Transforming terabytes and training ML models needs distributed computing power that a single machine (or plain SQL) can't deliver. Spark distributes the work across a cluster; Databricks removes the cluster-management pain.
Large-scale data transformation, data science and ML workloads, streaming + batch processing on the same platform — when data engineers/scientists need code-first power.
Pattern to remember: ADF orchestrates → Databricks transforms (reads/writes the Data Lake) → results land in Synapse/SQL for reporting. Clusters auto-terminate when idle to control cost.
The corporate analytics headquarters — reservoir access, treatment units, and the boardroom dashboard all in ONE building, so teams stop running between offices.
A unified analytics platform combining: data warehousing (dedicated SQL pools), on-demand querying of lake files (serverless SQL), Spark processing, and built-in pipelines (ADF engine) — one workspace, one studio.
Enterprises traditionally stitched together a warehouse + lake + ETL + Spark from separate tools. Synapse unifies them — query the lake and the warehouse together in the same place.
Enterprise data warehousing, BI/reporting at scale (Power BI integration), and when one team needs SQL + Spark + pipelines without managing separate services.
Exam lever — choose the right pool: Dedicated SQL pool (reserved, predictable heavy warehouse workloads), Serverless SQL pool (pay-per-query exploration of lake files — no infrastructure at all), Spark pool (big data processing).
A security guard watching the CCTV feed LIVE and raising the alarm the moment something looks wrong — instead of reviewing yesterday's recording tomorrow morning.
A real-time analytics engine that runs continuous SQL-like queries on streaming data (from Event Hubs / IoT Hub) — computing aggregates, detecting patterns, and pushing results out within seconds.
Some insights expire in seconds: a machine overheating, a fraud pattern, a traffic spike. Batch analytics that runs tonight is too late — you need answers while the data is still flowing.
Real-time dashboards, IoT alerting (temperature crosses threshold → alert), live fraud/anomaly detection, clickstream analysis.
The pipeline shape: Input (Event Hubs/IoT Hub/Blob) → Query (SQL with time windows — Tumbling, Hopping, Sliding) → Output (Power BI live dashboard, Functions, SQL, Cosmos). Windowing functions are the exam favorite — 'average temperature every 5 minutes' = Tumbling window.
Modern applications are not one big block — they're many small parts that must talk to each other WITHOUT depending on each other. Today's first theme: how parts of a system communicate (messages vs events), and the supporting plumbing — APIs, caching, configuration, and automated deployment.
A message is a registered parcel — it contains the actual goods and the sender expects it to be processed. An event is the doorbell — a lightweight 'something happened!' notification; whoever cares can react.
Two communication styles: A MESSAGE carries the data itself (an order, a job) — the producer expects someone to process it. An EVENT announces that something happened (a file was uploaded) — the producer doesn't know or care who reacts.
Choosing the wrong one is a classic architecture mistake. Order processing through an 'event' notification system can lose business data; sending heavy payloads as messages where a light ping suffices wastes resources.
Message → commands and valuable data that MUST be processed (orders, payments). Event → notifications and reactions (file uploaded → trigger thumbnail generator).
This single distinction decides the service: Messages → Service Bus / Storage Queues. Discrete events → Event Grid. Massive event streams → Event Hubs. Every AZ-305 messaging question starts here.
Registered post with delivery guarantee: parcels wait safely in order, delivery is confirmed, failed deliveries go to a special shelf for investigation, and one parcel can be photocopied to multiple subscribed recipients.
Enterprise-grade message broker. Queues (one sender → one receiver processes each message) and Topics with Subscriptions (one message → many subscribers, each with filters). Supports ordering (FIFO via sessions), transactions, duplicate detection, and dead-letter queues.
When messages represent money or commitments (orders, bookings, payments), 'mostly delivered' isn't acceptable. Service Bus guarantees delivery semantics that business-critical systems require.
Order processing, financial transactions, decoupling microservices where every message matters, publish-subscribe within enterprise systems.
Features that answer exam scenarios: Dead-letter queue (poison messages parked for inspection), Sessions (strict FIFO ordering), Duplicate detection, Topics+filters (each subscriber gets only relevant messages). 'Guaranteed, ordered, transactional' → Service Bus.
A simple token system at a government counter — take a token, wait your turn. No frills, costs almost nothing, handles a huge crowd.
Basic, very cheap message queuing built into Storage Accounts. Simple semantics: put message, get message, delete message. Scales to millions of messages.
Not every queue needs enterprise features. For simple background-job handoffs, Service Bus is overkill — Storage Queues do the job at a fraction of the cost.
Simple work distribution (web app drops jobs, workers pick them up), buffering bursts, queues over 80 GB (Service Bus max), cost-sensitive designs.
The comparison the exam loves: need ordering/transactions/topics/dead-lettering → Service Bus. Need simple + cheap + giant volume → Storage Queue. Both pair perfectly with Azure Functions queue triggers.
The society's notification system: when the water tanker arrives, the system instantly pings exactly the flats that subscribed to 'tanker updates' — not everyone, and the tanker driver doesn't maintain anyone's phone numbers.
A fully managed event ROUTING service. Sources (Storage, Resource Groups, custom apps) publish events; Event Grid pushes them instantly to subscribed handlers (Functions, Logic Apps, webhooks) with filtering.
Without it, services must constantly poll 'anything new? anything new?' — wasteful and slow. Event Grid inverts it: react instantly, pay per event, near-real-time push.
Reactive automation: blob uploaded → process it; VM created → tag it; custom app event → notify systems. The glue of serverless architectures.
Remember the trio: Event Grid = lightweight ROUTER of discrete events (reactions). Event Hubs = heavy PIPELINE for streaming millions of telemetry events (analytics). Service Bus = TRUCK for valuable message data (commands).
The baggage conveyor system of a giant airport — millions of bags per hour flow through continuously; downstream teams (analytics, security scan) each read the stream at their own pace.
A big-data event STREAMING platform ingesting millions of events per second — telemetry, logs, clickstreams. Consumers read the stream independently; events are retained for a period (rewindable).
IoT fleets and high-traffic apps generate firehoses of data that normal queues can't ingest. Event Hubs is purpose-built to swallow that firehose and feed it to analytics.
IoT telemetry ingestion, application log/clickstream pipelines, feeding Stream Analytics or Databricks with live data.
Key design knobs: Partitions (parallelism — set carefully, hard to change later), Consumer Groups (independent readers), Capture (auto-archive the stream to Data Lake — zero code), retention period. Pairs with Stream Analytics for the classic real-time pipeline.
A 5-star hotel receptionist: every visitor goes through the front desk — identity checked, visit logged, VIPs prioritized, troublemakers limited — and guests never wander the corridors knocking on random doors.
A gateway that sits in front of ALL your APIs — one front door. It handles security (keys, JWT, OAuth), rate limiting/throttling, request/response transformation, caching, analytics, and a developer portal with documentation.
Exposing 20 backend APIs directly means implementing security, throttling, and versioning 20 times — inconsistently. APIM centralizes it once, and backends can change without breaking consumers.
Publishing APIs to partners/public, microservices needing one secured entry point, monetizing APIs, managing API versions and revisions.
Power lives in Policies — XML rules applied to requests/responses: validate-jwt (auth), rate-limit (throttle per subscriber), set-header / rewrite, response caching. Tiers range from Consumption (serverless, pay-per-call) to Premium (VNet, multi-region).
Keeping a water bottle on your desk instead of walking to the cooler for every sip. The cooler (database) still exists — you just stop visiting it 50 times an hour.
Storing frequently read data in fast in-memory storage close to the app. Azure Cache for Redis: managed in-memory store with sub-millisecond reads. Azure CDN / Front Door caching: static content cached at edge locations near users.
The fastest database query is the one you never run. Caching slashes latency, cuts database load (and cost), and absorbs traffic spikes that would otherwise crush the database.
Redis: session state, hot product/catalog data, API response caching, leaderboards/counters. CDN: images, scripts, videos — anything static served globally.
Patterns to name in designs: Cache-aside (app checks cache → on miss, reads DB and fills cache) with TTL expiry. Redis tiers: Basic (dev), Standard (replicated), Premium (persistence, clustering, VNet). Decide upfront what staleness is acceptable.
One master control room for a chain of 50 stores. Change the discount banner once at HQ — every store updates instantly. No staff member edits posters by hand at each branch.
Azure App Configuration: a central store for application settings and feature flags, separate from code. Pairs with Key Vault, which holds the SECRETS (passwords, keys) while App Config holds regular settings.
Settings scattered across config files in every deployment = inconsistency, redeployments for tiny changes, and secrets accidentally committed to Git (a top real-world breach cause).
Apps deployed to multiple environments/instances, teams wanting feature flags (turn features on/off live, gradual rollouts), centralized settings governance.
Design rule: settings → App Configuration; secrets → Key Vault; app authenticates to both with Managed Identity (zero credentials in code). Feature flags enable testing in production safely — release the code dark, switch on for 5% of users first.
An assembly-line robot that builds the car identically every single time — vs a craftsman who's brilliant but occasionally forgets a bolt on Fridays.
Infrastructure as Code (IaC) + CI/CD pipelines. IaC (ARM templates / Bicep / Terraform) defines infrastructure in files. CI/CD (GitHub Actions / Azure DevOps Pipelines) automatically builds, tests, and deploys code on every change.
Manual deployments are slow, unrepeatable, and error-prone ('it worked in dev!'). Automation makes every environment identical, every release auditable, and rollbacks instant.
Every serious project. The exam expects IaC + pipelines as the default answer for 'consistent, repeatable deployments'.
Names to map: Bicep (Azure-native IaC, cleaner than ARM JSON), Terraform (multi-cloud IaC), GitHub Actions / Azure Pipelines (CI/CD), Deployment Slots on App Service (deploy to staging → warm up → swap to production with zero downtime, instant swap-back rollback).
In the cloud there is no office gate or guard — IDENTITY is the new security perimeter. Every design starts with: who is this (authentication), what may they do (authorization), and how do we keep checking (zero trust)? One security office runs it all: Microsoft Entra ID.
The company's security office: it issues every ID card, keeps the employee register, checks cards at every door, and instantly invalidates a card when someone leaves.
Microsoft's cloud identity and access management service. It stores identities (users, groups, applications, devices), authenticates sign-ins, issues tokens, and enables Single Sign-On across thousands of apps.
Without central identity, every app maintains its own usernames/passwords — users juggle 20 passwords, IT can't disable a leaver everywhere, and attackers feast on the weakest app.
Always — it's the identity foundation of Azure, Microsoft 365, and any modern app you build.
Core objects to know: Users, Groups (assign access to groups, never individuals), App Registrations (your apps), Service Principals & Managed Identities (non-human identities). Design mantra: authenticate with Entra ID, authorize with RBAC, eliminate stored passwords with Managed Identity.
A visitor pass for a partner company's employee — they enter your office using their OWN company ID card. You decide which rooms the pass opens; their company still owns the card.
Guest collaboration: invite external users (partners, vendors, consultants) into your tenant. They sign in with THEIR OWN organization's credentials — you never create or manage a password for them.
Creating internal accounts for externals is a security nightmare: orphaned accounts linger after projects end, and you carry their password risk. B2B keeps their identity at home while you control access.
Partner access to Teams/SharePoint/apps, vendor portals, consultants working in your Azure subscription.
Guests appear as 'Guest' user type — apply Conditional Access and Access Reviews to them (review quarterly: does this partner still need access?). Cross-tenant access settings control which external orgs you trust.
The membership system of a shopping mall app — customers sign up themselves with Google or Facebook or email. Lakhs of customers, but they're members, NOT employees with office ID cards.
A SEPARATE identity service for your customer-facing apps. Customers self-register and sign in with social accounts (Google, Facebook) or email — with fully customizable, branded sign-in pages. Scales to millions of users.
Customer identities must never mix with your corporate directory (different scale, different risk, different experience). And building your own secure login system from scratch is how breaches are born.
Any consumer-facing application — retail apps, citizen portals, customer self-service — needing sign-up/sign-in at scale.
Exam keyword mapping: 'partners/vendors collaborating' → B2B (guests in YOUR tenant). 'customers/consumers signing up with social accounts' → B2C (separate tenant, branded journeys via user flows/custom policies).
An intelligent bouncer: 'Known face, office laptop, usual city — go in. Same person from an unknown café Wi-Fi at 3 AM — show second ID (MFA). Sign-in attempt from an impossible location — blocked.'
Entra ID's policy engine: IF (user/group + app + location + device state + risk level) THEN (require MFA / require compliant device / block / allow). Evaluated on every sign-in.
A correct password is no longer proof of identity — passwords get phished daily. Context (where, what device, how risky) must shape the decision. This is the heart of Zero Trust.
Every organization. Baseline policies: require MFA for admins, block legacy authentication, require compliant devices for sensitive apps.
Design tips that score: use Report-only mode to test policies before enforcing; never lock yourself out (exclude a break-glass account); combine with Identity Protection risk signals ('high sign-in risk → block'). Requires Entra ID P1 (P2 for risk-based).
The bank's fraud department: it doesn't check your signature — it notices your card was suddenly swiped in two countries within an hour, or your card number appeared on a leaked list, and acts automatically.
An Entra ID P2 service using Microsoft's threat intelligence + ML to detect identity risks: leaked credentials found on the dark web, impossible travel, sign-ins from anonymized IPs/malware-linked sources. Classifies User risk and Sign-in risk (low/medium/high).
Humans can't watch millions of sign-ins for subtle attack patterns. Automated risk detection catches compromised accounts BEFORE damage spreads.
Organizations with Entra ID P2 wanting automated identity threat response — typically combined with Conditional Access.
The power move: risk-based Conditional Access — 'sign-in risk medium → force MFA; user risk high → force secure password change.' The system self-heals: a phished account gets challenged and remediated automatically.
The yearly key audit: 'Here's the list of everyone holding a server-room key. Does each one STILL need it?' — because people change teams, projects end, but keys are rarely returned.
Scheduled, automated recertification campaigns: periodically ask managers/owners (or users themselves) to confirm whether each person still needs a group membership, app access, or privileged role. Non-confirmed access is removed automatically.
Access only ever accumulates — 'permission creep'. After 3 years, employees hold access from every old project. Each unused permission is attack surface. Auditors demand proof of periodic review.
Privileged roles (review monthly/quarterly), guest B2B users (do partners still need access?), sensitive groups and apps. Compliance-driven environments especially.
Part of Entra ID Governance (P2). Configure: scope (which group/role), reviewers (managers/self/owners), frequency, and auto-action (remove access if denied or not answered). Pairs with PIM — Privileged Identity Management — where admin roles are activated just-in-time instead of held permanently.
An ID card issued to a ROBOT, not a person. The robot does its nightly job with its own card and its own limited permissions — it never borrows an employee's card.
An identity for an application or automation (not a human). Created via App Registration in Entra ID; the app authenticates with a client secret or certificate, and you grant it RBAC roles like any user.
Pipelines, scripts, and integrations need to access Azure. Using a human's account breaks when they leave and violates least-privilege. Apps deserve their own identity with exactly the permissions they need.
CI/CD pipelines deploying to Azure, third-party tools accessing your resources, multi-tenant apps, any non-human automation.
The exam hierarchy of preference: Managed Identity (best — Azure-managed, NO secret to store or rotate, but only for resources running IN Azure) → Service Principal with certificate → Service Principal with client secret (must rotate; store it in Key Vault, never in code).
The office safe with a strict guard: passwords, master keys, and stamped certificates live inside. Every opening is logged — who, when, which item — and even most managers can't peek inside.
A managed, HSM-backed service for storing three things: Secrets (connection strings, API keys, passwords), Keys (encryption keys), and Certificates (TLS certs with auto-renewal).
Hardcoded credentials in code/config are a leading cause of breaches — one leaked repo exposes everything. Key Vault centralizes secrets with access control, audit logs, and rotation.
Every application that touches a credential, key, or certificate. There is no scenario where hardcoding wins.
The golden pattern (memorize it): App uses Managed Identity → authenticates to Key Vault → retrieves secret at runtime → zero credentials anywhere in code or config. Design extras: RBAC permission model (preferred over access policies), soft-delete + purge protection (compliance), private endpoint for network isolation, customer-managed keys (CMK) when regulators require you to own encryption keys.
A system you can't observe is a system you can't run. Think of a hospital ICU: sensors on the patient (data sources), a central lab analyzing reports (Log Analytics), the doctor's visual charts (Workbooks & Insights), and a research facility crunching years of records in seconds (Data Explorer).
The ICU sensor network: heart-rate monitor, BP cuff, oxygen sensor — each device watches one thing, all readings flow to one central station where nurses see the full patient picture.
Azure Monitor is the umbrella observability platform collecting two data types — Metrics (numeric time-series: CPU %, request count) and Logs (detailed records: errors, sign-ins, traces) — from every layer: applications, Azure resources, OS (via agents), subscription activity, and tenant (Entra ID) logs.
Failures rarely announce themselves politely. Without telemetry you discover outages from angry customer calls. With it, you detect, diagnose, and often auto-respond before users notice.
Every production workload from day one. The design question is never IF you monitor — it's which sources, where data goes, and how long you keep it.
Architect's checklist: enable Diagnostic Settings on every resource (routes platform logs to a destination), install Azure Monitor Agent on VMs (guest OS telemetry), instrument apps with Application Insights, capture the Activity Log (who did what at subscription level). Routing options: Log Analytics (analyze), Storage (cheap archive), Event Hubs (stream to external SIEM like Splunk).
The hospital's central laboratory: every sample from every ward arrives here, and specialists run precise tests — 'show all patients whose fever spiked twice within six hours' — across the whole hospital's records at once.
The central workspace where logs from all sources land, and the query engine on top — KQL (Kusto Query Language) — to search, correlate, and analyze them. The foundation for log-based alerts.
Logs scattered across 50 resources answer nothing. Centralized + queryable means you can correlate ('show app errors AND the VM's CPU at that exact minute') — that correlation is how root causes are found.
The default destination for almost all diagnostic logs. Most organizations use ONE central workspace (or few, when regions/compliance demand separation).
Design levers the exam tests: workspace strategy (centralized vs per-region), retention (31 days default, configurable to 730 — archive longer to Storage for compliance), table-level retention to control cost, RBAC on workspace data. KQL basics worth showing students: Table | where | summarize count() by bin(TimeGenerated, 1h).
Workbooks = the doctor's chart combining temperature graph + BP trend + medication notes in one visual page. Insights = ready-made specialist dashboards — the cardiologist's standard heart-monitoring panel, pre-designed by experts.
Workbooks: interactive, customizable report canvases mixing KQL queries, metrics, text, and parameters into shareable dashboards. Insights: PRE-BUILT monitoring experiences for specific services — Application Insights (apps: requests, failures, dependency map, user analytics), VM Insights (performance + process map), Container Insights (AKS health), Network Insights.
Raw query results don't communicate. Visual, curated views turn telemetry into understanding — and pre-built Insights mean you don't design dashboards from scratch for common scenarios.
Workbooks: custom operational reports, incident postmortems, management-facing views. Insights: switch on for every app (App Insights), VM fleet, and AKS cluster you run.
Application Insights is the star — know its features: live metrics, distributed tracing across microservices, Application Map (visual dependency graph showing which component is failing), availability tests (ping your app from worldwide locations), smart detection of anomalies.
A super-librarian who has indexed every book in a national library — ask 'find every mention of this phrase across 10 billion pages' and get the answer in two seconds, not two weeks.
A standalone, massively scalable analytics engine optimized for huge volumes of telemetry, logs, and time-series data — interactive queries over billions of rows in seconds, using the same KQL. (Log Analytics actually runs on ADX technology underneath.)
Azure Monitor/Log Analytics is curated for Azure resource monitoring. But when YOUR PRODUCT generates terabytes of custom telemetry daily (IoT fleet, game events, app analytics), you need the raw engine — with full control over ingestion, retention, and cost.
Custom telemetry platforms at massive scale: IoT sensor analytics, clickstream analysis, security log exploration, any 'billions of rows, interactive speed' requirement.
The decision line for the exam: monitoring AZURE resources and apps → Azure Monitor + Log Analytics (managed experience). Building YOUR OWN large-scale telemetry/time-series analytics → Azure Data Explorer (you control clusters, databases, ingestion from Event Hubs/IoT Hub). Same KQL skills work in both — learn once, use everywhere.
Think of your Azure estate as a private township: internal roads (VNets), highways to your old office campus (VPN/ExpressRoute), smart toll plazas routing visitors to the right gate (delivery services), and security walls + guards at every entrance (protection services).
A town planner doesn't start by drawing roads — they ask: how many people, what traffic, which areas are restricted? The network design FOLLOWS the workload's needs, never the other way round.
The discipline of deriving network design from workload requirements: who connects (internet users? employees? partners?), what traffic patterns (north-south vs east-west), what isolation/compliance is needed, and what performance/latency targets exist.
Networks designed without requirements become either over-engineered (cost) or under-secured (breach). AZ-305 scenarios always hide the answer inside the requirements.
First step of EVERY infrastructure design — before choosing any networking service.
The standard enterprise answer is Hub-Spoke topology: a hub VNet holds shared services (firewall, gateways, DNS), spoke VNets hold workloads, connected via peering. Read requirements for keywords: 'private only' → Private Endpoints, 'deterministic latency to on-prem' → ExpressRoute, 'global users' → Front Door.
The township's road system: internal colony roads (VNet), connecting bridges between colonies (peering), a managed highway network when colonies multiply (Virtual WAN), and the address directory telling everyone where each house is (DNS).
The building blocks connecting things INSIDE Azure: Virtual Network (your private IP space with subnets), VNet Peering (private connection between VNets — even cross-region), Virtual WAN (Microsoft-managed global hub network for large estates), Azure DNS + Private DNS Zones (name resolution, including for private endpoints).
Resources must communicate privately and predictably. Peering keeps traffic on Microsoft's backbone (never the public internet); Private DNS makes private endpoints resolvable by name.
VNet: always. Peering: connecting 2–20 VNets. Virtual WAN: dozens of VNets + many branch offices globally. Private DNS zones: mandatory whenever you use Private Endpoints.
Design facts that score: peering is non-transitive (A↔B and B↔C does NOT give A↔C — the hub-spoke pattern with a firewall/NVA in the hub solves this), peered traffic stays on the Microsoft backbone, and each private endpoint type needs its matching privatelink DNS zone.
Three ways to reach your old office campus: a secure tunnel through public roads (VPN), your own private dedicated highway (ExpressRoute), or a personal secure tunnel for one traveling employee's laptop (Point-to-Site).
Options to connect your datacenter/offices to Azure: Site-to-Site VPN (IPsec tunnel over the internet via VPN Gateway — up to ~10 Gbps aggregate), Point-to-Site VPN (individual devices connect in), and ExpressRoute (private dedicated circuit through a connectivity provider — 50 Mbps to 100 Gbps, traffic never touches the internet).
Hybrid is reality: apps in Azure need to reach databases, AD, and systems still on-premises — securely and reliably.
S2S VPN: quick start, dev/test, smaller orgs, backup path. ExpressRoute: production hybrid, predictable latency, compliance ('traffic must not traverse public internet'), high bandwidth. P2S: remote developers/admins.
The exam pattern: 'mission-critical + deterministic latency / private connection' → ExpressRoute; add a Site-to-Site VPN as FAILOVER for the ExpressRoute circuit (the classic resilient hybrid design). Gateway SKU choices control throughput; ExpressRoute needs a Gateway too.
Traffic management for the township: a simple traffic constable distributing cars between parallel lanes (Load Balancer), a smart toll plaza reading destination boards and routing by address (Application Gateway), and a national highway authority directing travelers to the nearest city (Front Door / Traffic Manager).
Services that distribute user traffic: Azure Load Balancer (Layer 4, TCP/UDP, regional), Application Gateway (Layer 7, HTTP-aware: URL-path routing, SSL termination, cookie affinity, regional — with optional WAF), Azure Front Door (Layer 7, GLOBAL: edge routing, CDN caching, WAF), Traffic Manager (DNS-based global routing — no traffic flows through it).
One server can't serve everyone, and global users can't all be served well from one region. Delivery services spread load, route intelligently, and survive backend failures.
Memorize the 2×2: Regional + non-HTTP → Load Balancer. Regional + HTTP → Application Gateway. Global + HTTP → Front Door. Global + DNS-level/any protocol → Traffic Manager.
Combinations are the real-world answer: Front Door (global entry, caching, WAF) → Application Gateway (regional L7, fine routing) → backend pools. 'URL path /images goes to server pool B' → App Gateway path-based routing. 'Route users to nearest region + failover' → Front Door (HTTP) or Traffic Manager (non-HTTP).
Layered security of a bank branch: boundary walls (DDoS Protection), the main gate guard checking every visitor (Firewall), frisking at the hall entrance for known attack patterns (WAF), room-level door locks (NSGs), no public entrance at all for the vault (Private Endpoints), and a secure manager's corridor for staff (Bastion).
The defense-in-depth toolkit: DDoS Network Protection (absorbs volumetric attacks), Azure Firewall (managed stateful firewall, FQDN/network rules, threat intelligence — sits in the hub), WAF (on App Gateway/Front Door — blocks SQL injection, XSS, OWASP Top 10), NSGs (subnet/NIC-level allow-deny rules), Private Endpoints (remove public exposure of PaaS), Azure Bastion (browser-based RDP/SSH to VMs without public IPs).
Internet-facing applications are attacked constantly and automatically. One layer always eventually fails — defense in depth means the next layer holds.
Public web app → WAF always. Hub-spoke enterprise → Azure Firewall in hub inspecting east-west + egress. Every VM admin access → Bastion (never public RDP). Every production PaaS → Private Endpoint.
Know the layering order for design answers: DDoS at the edge → Front Door/App GW WAF for HTTP attacks → Azure Firewall for network/egress control → NSG for micro-segmentation → Private Endpoints to shrink the attack surface → Bastion for admin access. NSG vs Firewall: NSG = simple L3/L4 rules per subnet; Firewall = centralized, stateful, FQDN-aware, logs everything.
Two different bad days, two different answers: 'we deleted/corrupted data' → BACKUP (go back in time), and 'our whole region/datacenter is down' → SITE RECOVERY (run from somewhere else). Every design starts with two numbers: RPO — how much data can we afford to lose, and RTO — how fast must we be back.
Insurance planning: how much loss can you absorb (RPO) and how quickly must life return to normal (RTO)? A street vendor and a hospital answer these very differently — so their 'insurance' costs differ too.
The framework for protection design: RPO (Recovery Point Objective — maximum acceptable data loss, determined by backup/replication frequency) and RTO (Recovery Time Objective — maximum acceptable downtime, determined by your restore/failover method).
Backup design without RPO/RTO is guesswork. These two numbers — set by the BUSINESS, not IT — decide every technical choice and its cost.
The first question in every BCDR design. AZ-305 scenarios state them explicitly ('RPO of 15 minutes, RTO of 1 hour') — they are the answer key.
Map mentally: daily backup → RPO up to 24h. Continuous replication (ASR) → RPO of minutes/seconds. Restore from backup → RTO of hours. Failover to warm standby → RTO of minutes. Tight RPO/RTO = replication + standby infrastructure = more cost. Loose RPO/RTO = scheduled backups = cheap.
A disciplined night watchman who photographs every room at a fixed time, stores photos in a fireproof archive in another building, keeps them as long as the policy says, and can recreate any room exactly as it was on any date.
Azure's centralized managed backup service. Backs up Azure VMs, SQL/SAP in VMs, Azure Files, Blobs, Disks, and even on-premises machines (via MARS agent / Azure Backup Server). Data goes to a Recovery Services Vault (or Backup Vault for newer workloads) with policies controlling schedule + retention.
Ransomware, accidental deletion, corruption — backup is the last line of defense. A managed service means no backup servers, no tape rotation, no scripts to babysit.
Every production workload, full stop. The design questions are scope, frequency, retention, and vault protection.
Design levers for the exam: vault redundancy (LRS/ZRS/GRS — GRS + cross-region restore for regional disaster), soft delete (deleted backups retained 14+ days), immutability + Multi-User Authorization (ransomware can't delete your backups), and policy tiers (instant restore snapshots for fast recovery + vault tier for retention).
A diary written in pen with carbon paper: every change creates a copy of the previous page (versioning), torn pages sit in a recoverable dustbin for some days (soft delete), and you can reconstruct the whole diary exactly as it looked on any past date (point-in-time restore).
Blob protection has two modes: Operational backup — versioning + soft delete + change feed + point-in-time restore, all IN the same storage account (no copy elsewhere). Vaulted backup — data actually copied into a Backup Vault (survives even the storage account's deletion).
Blobs hold user uploads and data lakes — overwritten or deleted blobs are business losses. Most blob 'disasters' are logical (wrong delete, bad code), which operational backup reverses instantly.
Operational backup: the default for accidental deletion/corruption protection (fast, cheap, local). Vaulted backup: compliance demands isolated copies, or protection against the whole account being compromised/deleted.
Layer the features: soft delete (blob + container level), blob versioning (every overwrite keeps the old version), point-in-time restore (roll a container back to a timestamp). Combine with immutable storage (WORM legal hold) when regulators require unchangeable data.
The office shared drive gets photographed each night — and any employee can themselves walk to the archive and pull yesterday's version of one file, without raising an IT ticket.
Azure Backup protects file shares using share snapshots — point-in-time, incremental copies of the entire share. Managed by vault policies (schedule + retention up to 10 years); restore a whole share or a single file/folder. Vaulted backup option copies data into the vault for true isolation.
Shared drives suffer constant human error — overwritten spreadsheets, deleted folders. Snapshot-based protection makes recovery a minutes-long self-service task.
Every Azure Files share used by humans or lift-and-shift apps. Snapshot-only for convenience recovery; vaulted when the copy must survive account-level disasters.
Useful design detail: end users on Windows can restore via 'Previous Versions' directly from the snapshot — zero IT involvement. Soft delete on file shares protects against the share itself being deleted.
Photographing the entire flat — furniture, wiring, contents — so you can rebuild an identical flat anywhere, or just retrieve one document from the photo without rebuilding anything.
Azure Backup takes application-consistent snapshots of entire VMs (all disks) on schedule into a Recovery Services Vault. Restore options: create a new VM, restore disks only, replace existing disks, or file-level recovery (mount the backup and copy individual files).
VMs hold OS + app + config + data together. A corrupted VM rebuilt manually takes days; restored from backup, it takes minutes to hours.
Every production VM. Frequency/retention by criticality; combine with ASR when the requirement is regional DR, not just data protection.
Exam details: Instant Restore keeps snapshots locally for 1–5 days (very fast restores), application-consistent backups via VSS on Windows (no shutdown needed), cross-region restore from GRS vaults, and Enhanced policy for multiple backups per day (tighter RPO).
A bank ledger photocopied automatically every few minutes — you can reopen the books exactly as they were at 2:47 PM last Tuesday (point-in-time), and yearly closing ledgers are preserved for a decade for the auditors (long-term retention).
Azure SQL Database backups are automatic and built-in: full weekly + differential (12/24h) + transaction log every 5–10 minutes. Point-in-Time Restore (PITR) to any second within 1–35 days retention. Long-Term Retention (LTR) keeps weekly/monthly/yearly fulls for up to 10 years. Backup storage redundancy is configurable (LRS/ZRS/GRS) — GRS enables geo-restore.
Database data is the most unforgiving — a bad UPDATE at 2:47 PM must be reversible to 2:46 PM. The 5–10 minute log backups are what make RPO that tight.
PITR: the everyday answer to corruption/accidental changes. LTR: compliance ('keep yearly backups 7 years'). Geo-restore: budget DR (restore in another region from geo-replicated backups — RTO hours).
The exam distinction: Geo-restore (from backups — cheap, RTO in hours, RPO up to 1h) vs Failover Groups (live replica — RTO minutes, RPO seconds, costs a running secondary). The requirement's RTO/RPO numbers tell you which one they want.
A fully furnished standby office in another city, continuously mirrored — desks, files, phone lines. Disaster strikes the main office, staff walk into the standby and resume within minutes. And twice a year you rehearse the move without disturbing real work (test failover).
Azure's Disaster Recovery service: continuously REPLICATES running VMs (Azure region→region, VMware/Hyper-V/physical→Azure) to a secondary location. On disaster: failover (boot replicas in the recovery region), later failback. Recovery Plans script the ordered, one-click failover of entire applications.
Backup answers 'data lost'; it does NOT answer 'region down, business must keep RUNNING'. ASR exists for continuity — minutes of downtime instead of days of rebuilding.
Regional DR requirements, DR for on-prem servers (Azure as the DR site — no second datacenter needed), and datacenter migrations (replicate, then 'fail over' permanently).
The decision rule to teach: Backup = RPO in hours, RTO in hours, protects DATA (and history). ASR = RPO in seconds–minutes, RTO in minutes, protects the RUNNING WORKLOAD. Mature designs use BOTH. Bonus: test failovers run in an isolated network — DR drills with zero production impact.
Migration is house-shifting at enterprise scale: decide WHY you're moving (strategy), survey everything you own (assess), choose transport for each item — courier the documents online, send the heavy furniture by truck (online vs offline tools) — and unpack into a house that's already wired and secured (landing zone).
A complete house-shifting playbook written by people who've moved a thousand families: why move, what to pack, prepare the new house, move in phases, set house rules, maintain it well.
Microsoft's end-to-end methodology for cloud adoption, in stages: Strategy (business justification) → Plan (digital estate, skills, adoption plan) → Ready (landing zones) → Adopt (Migrate / Innovate) → and the continuous disciplines: Govern and Manage, with Secure across everything.
Most failed cloud projects fail on process, not technology — no business case, no governance, lift-everything-blindly. CAF is the tested path around those failures.
Any organization's cloud journey — and the vocabulary AZ-305 expects you to use when a scenario spans strategy-to-operations.
Map scenario language to stages: 'build the business case' → Strategy. 'inventory and prioritize workloads' → Plan. 'prepare subscriptions/networking/identity' → Ready (Landing Zones — Section 1 connects here!). 'move workloads' → Adopt-Migrate. 'enforce standards' → Govern. 'monitor and operate' → Manage.
Moving day discipline: survey every room and label boxes (Assess), transport in planned trips — fragile items first (Migrate), arrange furniture properly in the new home, return the extra truck (Optimize), then set up locks and smoke alarms (Secure & Manage).
The four-stage execution loop inside CAF's Adopt phase: Assess (discover servers/apps/databases, map dependencies, estimate costs) → Migrate (move in waves) → Optimize (right-size, reserved instances, remove waste) → Secure & Manage (backup, monitoring, security baseline).
Migrating 300 servers in one big-bang weekend is how outages happen. Waves, dependency awareness, and post-move optimization make it boring — which is the goal.
Every migration project. Wave planning: start with low-risk, low-dependency workloads; save the tangled core systems for later waves.
Know the 5 Rs of rationalization (decided per workload during Assess): Rehost (lift-and-shift to VMs — fastest), Refactor (minor changes, e.g., to App Service/containers), Rearchitect (redesign for cloud), Rebuild (rewrite cloud-native), Replace (drop it for SaaS). Exam scenarios hint which R: 'minimal changes, tight deadline' → Rehost.
The surveyor's visit before shifting: photographing every room, noting which appliance connects to which socket, and estimating the new home's rent and electricity bill before signing anything.
Azure Migrate is the central hub for discovery and assessment: deploy an appliance that discovers on-prem servers (VMware, Hyper-V, physical), collects performance data, maps dependencies (which server talks to which — agentless), checks Azure readiness, recommends VM sizes, and estimates monthly cost.
You can't migrate what you don't understand. Dependency mapping prevents the classic disaster: moving an app while its database stays behind, connected by a now-slow, now-broken link.
Always the first technical step. Run discovery for 2–4 weeks so performance-based sizing reflects real peaks, not one quiet afternoon.
Assessment outputs that drive design: readiness (ready / conditionally ready / not ready), right-sized SKU recommendations (performance-based beats as-is — often 30–40% cheaper), dependency groups (these servers move TOGETHER as one wave), and TCO comparison for the business case.
Different movers for different items: the document courier (database tools), the van service for regular luggage (server replication), and the sealed container truck for the entire godown (Data Box).
The toolbox, by target: Servers → Azure Migrate: Server Migration (agentless/agent-based replication, test migrations). Databases → DMA (Data Migration Assistant: assess compatibility) + DMS (Database Migration Service: execute the move). Storage/files → AzCopy, Azure Storage Mover, Azure File Sync (online) or the Data Box family (offline). Web apps → App Service Migration Assistant.
One tool doesn't fit all — a 500 GB database, 300 VMs, and 60 TB of file shares each need a different vehicle. Exam questions are exactly this matching game.
Choose by: what's moving, how big, how much bandwidth exists, and how much downtime is allowed.
Matching cheat-sheet: 'Will my SQL Server work in Azure?' → DMA (assessment). 'Move it with minimal downtime' → DMS online. 'Move 300 VMware VMs' → Azure Migrate Server Migration. 'Move 60 TB over a slow link' → Data Box. 'Continuously sync file server to Azure Files' → File Sync.
Shifting the accounts department: first an expert checks whether the old ledger format works in the new office (DMA), then the actual move — either over a long weekend with books closed (offline) or by keeping a live synced copy and switching one quiet night with five minutes of pause (online).
The database journey: DMA assesses compatibility (deprecated features, breaking changes) → choose the target (Azure SQL Database / SQL Managed Instance / SQL on VM) → DMS executes: offline mode (downtime during the move) or online mode (continuous sync, brief cutover).
Databases are the riskiest, most downtime-sensitive migration component. Assessment-first avoids discovering an incompatibility AFTER the weekend cutover starts.
Offline DMS: dev/test, small DBs, generous maintenance windows. Online DMS: production systems where minutes — not hours — of downtime is the limit.
Target selection is half the exam answer: SQL on Azure VM (full control, legacy features — Rehost), SQL Managed Instance (near-100% SQL Server compatibility: SQL Agent, cross-DB queries — the sweet spot for lift-and-shift), Azure SQL Database (PaaS-first for modern single apps). 'Uses SQL Agent jobs + cross-database queries, minimal changes' → Managed Instance.
Sending your files over the internet courier: instant pickup for small parcels (AzCopy), a managed relocation service that moves entire godowns shelf-by-shelf while you watch a dashboard (Storage Mover), and a magic cupboard that keeps the old office and new office in sync during the transition (File Sync).
Network-based transfer options: AzCopy (command-line bulk copy to Blob/Files — scriptable, restartable), Azure Storage Mover (managed service migrating on-prem NFS/SMB shares to Azure at scale, with central tracking), Azure File Sync (sync + tiering between Windows file servers and Azure Files — also a gradual migration path), and Azure Data Factory (when data needs transformation en route).
When bandwidth is sufficient, online migration is simpler, continuous, and has no hardware logistics — data starts arriving today.
Rule of thumb: data size ÷ available bandwidth = days. If a 10 TB share over your link takes 3 days — go online. If 60 TB takes 3 months — go offline (next card).
Scenario mapping: one-time bulk blob upload → AzCopy. Many on-prem shares, managed + monitored → Storage Mover. Keep on-prem server as cache during gradual move → File Sync. Migrate + reshape data → Data Factory.
When the internet courier would take months, Microsoft ships you an armored, tamper-proof container truck: load your data locally at full speed, the truck drives to the Azure datacenter, and they unload it for you.
Physical transfer appliances: Data Box Disk (up to 8 SSDs, ~35–40 TB total — small batches), Data Box (~80–100 TB usable rugged appliance), Data Box Heavy (~800 TB–1 PB, on wheels). All AES-encrypted; data is uploaded into your storage account, then devices are wiped to NIST standards.
Physics: 100 TB over a 100 Mbps line ≈ 100+ days. A truck is genuinely faster than the network at these scales — 'never underestimate the bandwidth of a truck full of disks.'
Tens of TBs and beyond, limited/expensive bandwidth, remote sites, or deadline-driven datacenter exits. Also works in reverse (export FROM Azure).
Exam selection by size: a few TB→40 TB → Data Box Disk. ~50–100 TB → Data Box. Hundreds of TB→1 PB → Data Box Heavy. Keyword 'limited bandwidth' or 'no internet connectivity' in a scenario = Data Box family, regardless of size.
Azure's hardest part isn't learning services — it's choosing between similar-sounding ones. Each table below is a decision tool: find your scenario's keywords in the right column, and the left column is your answer.
| Service | You Manage | Best For | Avoid When |
|---|---|---|---|
| Virtual Machines | OS + everything above | Legacy apps, full control, lift-and-shift | Team wants zero infra management |
| Azure Batch | Job definitions only | Massive parallel jobs (rendering, simulations) | Long-running services / web apps |
| App Service | Just your code | Web apps & APIs — the default choice | Non-HTTP workloads, OS-level needs |
| Container Instances | Container image | Quick single containers, burst tasks | Need orchestration, scaling, LB |
| AKS | Apps + cluster config | Microservices at scale, DevOps-mature teams | Small team, simple app (overkill) |
| Functions | Just functions | Event-driven, bursty, pay-per-run | Long-running, steady heavy load |
| Logic Apps | Visual workflow | Integration & business process automation | Complex custom logic (use Functions) |
| Service | Carries | Superpower | Pick When You Hear |
|---|---|---|---|
| Service Bus | Messages (valuable data) | Ordering, transactions, dead-letter, topics | 'orders', 'guaranteed', 'FIFO', 'enterprise' |
| Storage Queues | Messages (simple) | Dirt cheap, >80 GB queues | 'simple', 'cost-effective', 'basic queue' |
| Event Grid | Discrete events (notifications) | Instant push routing + filtering, pay-per-event | 'react when X happens', 'serverless glue' |
| Event Hubs | Event streams (telemetry) | Millions of events/sec, replayable stream | 'telemetry', 'IoT', 'streaming', 'ingest' |
| Service | OSI Layer | Scope | Pick When |
|---|---|---|---|
| Azure Load Balancer | L4 (TCP/UDP) | Regional | Non-HTTP traffic within a region |
| Application Gateway | L7 (HTTP) | Regional | Path-based routing, SSL offload, WAF — regional web apps |
| Azure Front Door | L7 (HTTP) | Global | Global web apps: edge routing + CDN + WAF |
| Traffic Manager | DNS | Global | Global routing for ANY protocol, region failover |
| Service | Access Style | Best For | Not For |
|---|---|---|---|
| Blob Storage | HTTPS / REST / SDK | App files, media, backups, any objects | SMB drive mapping, VM disks |
| Azure Files | SMB / NFS (mapped drive) | Shared drives, lift-and-shift file shares | Massive analytics data |
| Managed Disks | Attached to a VM | VM OS & data volumes | Sharing across many clients |
| Data Lake (ADLS Gen2) | Blob + hierarchical namespace | Big-data analytics storage | Simple app file storage |
| Option | Copies & Location | Survives | Doesn't Survive |
|---|---|---|---|
| LRS | 3× in one datacenter | Disk/rack failure | Datacenter or region loss |
| ZRS | 3× across 3 zones, one region | Entire datacenter (zone) loss | Regional disaster |
| GRS | LRS + 3× in paired region | Regional disaster | — (secondary readable only after failover) |
| RA-GRS | GRS + readable secondary | Regional disaster + read access anytime | — |
| GZRS / RA-GZRS | ZRS + paired region copies | Zone AND regional disasters (max protection) | — |
| Aspect | Azure SQL Database | Cosmos DB |
|---|---|---|
| Data model | Relational (tables, joins, ACID) | NoSQL multi-model (JSON docs, key-value, graph) |
| Schema | Fixed, enforced | Flexible, evolves freely |
| Global writes | Single write region (readable replicas) | Multi-region writes, active-active |
| Latency promise | No fixed SLA on latency | <10 ms reads/writes (SLA-backed) |
| Pick when | Transactions, reporting, relational integrity | Global scale, low latency everywhere, flexible schema |
| Option | Identity Belongs To | Use Case | Key Trait |
|---|---|---|---|
| Entra B2B | Partner's own organization | Partners/vendors collaborating in YOUR tenant | Guest users, their org keeps the credentials |
| Azure AD B2C | The customer (social/email) | Consumer apps: sign-up/sign-in at scale | Separate tenant, branded journeys, millions of users |
| Managed Identity | The Azure resource itself | Azure resource → Azure service auth | NO secrets to store or rotate — always prefer it |
| Service Principal | An app registration | CI/CD, external apps, automation | Has a secret/cert you must protect & rotate |
| Aspect | Site-to-Site VPN | ExpressRoute |
|---|---|---|
| Path | Encrypted tunnel over public internet | Private dedicated circuit (no internet) |
| Bandwidth | Up to ~10 Gbps aggregate | 50 Mbps – 100 Gbps |
| Latency | Variable (internet-dependent) | Predictable, deterministic |
| Setup | Hours, low cost | Weeks (provider involved), higher cost |
| Pick when | Quick start, dev/test, backup path | Production hybrid, compliance, performance SLAs |
| Aspect | Azure Backup | Azure Site Recovery |
|---|---|---|
| Protects against | Deletion, corruption, ransomware | Region/site outage — business continuity |
| What you get back | DATA, from a point in time | RUNNING workloads, in another region |
| RPO | Hours (schedule-based) | Seconds–minutes (continuous replication) |
| RTO | Hours (restore time) | Minutes (failover) |
| Cost model | Storage of recovery points | Replication + standby infrastructure |
| Job | Tool | Mode | Why |
|---|---|---|---|
| Discover + assess servers | Azure Migrate | Appliance-based | Dependency maps, sizing, cost estimates |
| Check DB compatibility | Data Migration Assistant (DMA) | Assessment | Finds breaking changes before you move |
| Move databases | Database Migration Service (DMS) | Online / Offline | Online = minimal downtime cutover |
| Bulk file copy (good bandwidth) | AzCopy | Online | Scriptable CLI, restartable |
| Managed share migration | Azure Storage Mover | Online | Central tracking of many NFS/SMB shares |
| Gradual file server move | Azure File Sync | Online (sync) | On-prem stays as cache during transition |
| 10s of TB, weak bandwidth | Data Box Disk / Data Box / Heavy | Offline | ~40 TB / ~100 TB / ~1 PB by truck |
| Tool | Data Shape | Best At | Pick When |
|---|---|---|---|
| Azure Monitor Metrics | Numeric time-series | Near-real-time charts & fast alerts | CPU %, request count, autoscale triggers |
| Log Analytics | Structured logs (KQL) | Correlation & root-cause across resources | Monitoring YOUR Azure estate & apps |
| Azure Data Explorer | Massive telemetry (KQL) | Billions of rows, interactive speed | YOUR PRODUCT's custom telemetry platform |
AZ-305 doesn't test definitions — it gives you a company, requirements, and constraints, then asks you to choose. Below are three full enterprises. For each: read the requirements FIRST, try answering module by module, then compare with the architect's decisions. The 'why' column is the exam mindset.
GlobalKart, a successful Indian online marketplace, is expanding to Europe and the US. Festival sales bring 20× traffic spikes. The board demands 99.99% availability, fast page loads on three continents, and customer sign-in with Google/Facebook. Orders are sacred — none may ever be lost. The dev team is small and wants minimum infrastructure management.
| Module | Architect's Decision | The Exam Mindset — WHY |
|---|---|---|
| Governance | Management groups (Prod / Non-Prod), subscription per environment, Azure Policy enforcing allowed regions + mandatory CostCenter tags | Multi-region expansion without governance = untraceable costs and accidental deployments in wrong regions. Structure first. |
| Compute | App Service (Premium) in 3 regions with autoscale rules | 'Small team + minimize overhead' kills VMs and AKS. App Service autoscale absorbs 20× spikes; Premium gives zone redundancy and VNet integration. |
| Delivery & Protection | Azure Front Door with WAF + edge caching in front of all regions | Global + HTTP → Front Door (the 2×2 matrix). One global entry: nearest-region routing, static content cached at edge, OWASP attacks blocked before reaching origin. |
| Databases | Azure SQL with Failover Groups (orders) + Cosmos DB multi-region (catalog, cart) + Redis (sessions) | Orders need ACID + 99.99% → SQL Business Critical with failover groups. Catalog/cart need global millisecond reads + flexible schema → Cosmos. Sessions are temporary hot data → Redis. Polyglot persistence in action. |
| Messaging | Service Bus queues for order processing; Event Grid for reactive automation; Event Hubs for clickstream | 'Zero order loss' → Service Bus (guaranteed delivery, dead-lettering). Doorbell-style reactions → Event Grid. High-volume behavioral telemetry → Event Hubs. The trio, each in its lane. |
| Identity | Azure AD B2C for customers; Managed Identities + Key Vault for all app-to-service auth | 'Customers + social sign-in' → B2C (never B2B). Apps authenticate with Managed Identity — zero credentials in code. |
| Monitoring | Application Insights (availability tests from 3 continents) + central Log Analytics + autoscale alerts | 99.99% must be MEASURED. Availability tests prove the SLA from the user's side; App Map finds the failing component during festival chaos. |
| Backup | SQL PITR + LTR, GZRS storage redundancy, Cosmos continuous backup | A bad deployment that corrupts data needs point-in-time rewind. GZRS = survives zone AND regional disasters. |
Pattern to internalize: requirements are filters. 'Small team' eliminated half the compute options instantly. 'Zero order loss' chose the message broker. 'Global + social customers' chose B2C. Every AZ-305 answer is hiding inside a requirement sentence.
SecureBank is moving its loan-processing platform to Azure. Regulators mandate: no service may be reachable from the public internet, all data stays in India, connectivity from branches must never traverse the public internet, and even database administrators must not be able to read customers' card numbers. DR requirement: RPO 5 minutes, RTO 1 hour. Every administrative action must be audited, and admin rights must not be permanent.
| Module | Architect's Decision | The Exam Mindset — WHY |
|---|---|---|
| Networking | Hub-spoke VNets; ExpressRoute primary + S2S VPN failover; Azure Firewall in hub; Private Endpoints + Private DNS for ALL PaaS; Bastion for admin access | 'Never over public internet' → ExpressRoute (VPN still rides the internet — only acceptable as backup). 'Zero public endpoints' → Private Endpoints everywhere + Bastion instead of public RDP. Hub firewall inspects everything. |
| Governance | Azure Policy: deny public IPs, deny non-India regions, require Private Endpoints; initiative assigned at management group | Compliance must be ENFORCED, not requested. Policy 'Deny' effect makes non-compliant resources impossible to create — auditors love 'cannot' more than 'should not'. |
| Data Security | Azure SQL Business Critical (zone-redundant) + Always Encrypted on PAN columns + TDE with customer-managed keys in Key Vault (HSM) | TDE protects at rest, TLS in transit — but 'DBA must not see card numbers' is the IN USE state → Always Encrypted (keys live with the app, never the DB). Regulators requiring key ownership → CMK in HSM-backed Key Vault. |
| DR | SQL Failover Group to second India region (Central India ↔ South India) + ASR for application VMs + Recovery Plan; quarterly test failovers | RPO 5 min kills backup-based DR (geo-restore RPO ≈ 1 h). Continuous replication only: Failover Groups (RPO seconds) + ASR (RPO minutes, RTO minutes). Residency keeps both regions in India. |
| Identity | Conditional Access (MFA + compliant device for admins), PIM for just-in-time role activation, quarterly Access Reviews, Identity Protection risk policies | 'No standing privileges' is literally PIM's definition — roles activated just-in-time with approval + audit. Access Reviews prove periodic recertification to auditors. |
| Monitoring | Central Log Analytics (731-day interactive + archive), Activity Log + Entra audit logs captured, diagnostic settings on everything, stream to SIEM via Event Hubs | 'Every action audited' → capture the control-plane (Activity Log) AND identity-plane (Entra logs) AND data-plane (resource diagnostics). Event Hubs export feeds the bank's existing SIEM. |
| Backup | Azure Backup with immutable vault + Multi-User Authorization + soft delete; SQL LTR 7 years | Banks are ransomware's favorite target — immutability means even a compromised admin can't delete recovery points. LTR satisfies the regulator's retention mandate. |
Regulated-industry reflex: every requirement maps to a SPECIFIC control — 'not public internet' = ExpressRoute, 'no public endpoint' = Private Endpoint, 'DBA can't read' = Always Encrypted, 'no standing access' = PIM. Vague answers ('use security best practices') score zero; named controls score full marks.
MegaFab's datacenter lease ends in 9 months: 300 VMware VMs, a 60 TB file server, and SQL Server 2014 (heavy SQL Agent jobs + cross-database queries) must move to Azure. The factory's internet link is only 200 Mbps. Leadership also wants a new IoT capability: 2,000 machines streaming sensor data, with real-time overheating alerts and 5 years of historical analysis. Factory floors sometimes lose connectivity for hours.
| Module | Architect's Decision | The Exam Mindset — WHY |
|---|---|---|
| Strategy (CAF) | CAF: Strategy → Plan → Ready (deploy Landing Zone FIRST) → Adopt-Migrate in waves; rationalization = mostly Rehost | '9 months + minimal changes' → Rehost (lift-and-shift) for almost everything; modernize later. Landing zone before the first VM moves — governance retrofitted onto 300 VMs is pain. |
| Assessment | Azure Migrate appliance: 4 weeks of discovery, performance-based sizing, dependency mapping → migration waves | Dependency maps decide the waves — app servers move WITH their databases. Performance-based sizing typically cuts the compute bill 30–40% vs as-is sizing. |
| Server Migration | Azure Migrate: Server Migration — agentless VMware replication, test migrations per wave, weekend cutovers | Built for exactly this: replicate in background, TEST in an isolated VNet (no production impact), cut over wave by wave. Low-risk apps first, the tangled core last. |
| Database | DMA assessment → Azure SQL Managed Instance via DMS online migration | SQL Agent jobs + cross-DB queries → Managed Instance (near-100% compatibility); Azure SQL Database would break both. DMS online keeps downtime to a brief cutover. 2014's end-of-support adds urgency. |
| File Data (60 TB) | Data Box (offline) for the 60 TB bulk → Azure Files; File Sync afterwards for the delta + branch cache | Maths first: 60 TB over 200 Mbps ≈ 30+ days of saturated link — unacceptable. Data Box moves the bulk by truck in days; File Sync handles changes made during transit and keeps a local cache for the factory. |
| IoT Pipeline | Machines → IoT Hub/Event Hubs → Stream Analytics (tumbling-window alerts) → Azure Data Explorer (5-year history); Power BI dashboards | Firehose ingestion → Event Hubs. 'Alert when temperature crosses X' → Stream Analytics windowed queries in real time. 'Billions of readings, 5 years, interactive analysis' → ADX, purpose-built time-series engine. |
| Edge Resilience | Azure SQL Edge on factory-floor gateways: local capture + processing, sync to cloud when link restores | 'Operates during connectivity loss' → process at the edge. SQL Edge stores and analyzes locally, then syncs — the factory never depends on the WAN to run. |
| Protect (Day 1) | Azure Backup on all migrated VMs + SQL MI PITR/LTR; ASR region-to-region for the critical production line systems | Migration's Secure & Manage stage is part of migration, not an afterthought. The old datacenter's tape backups are gone — protection must exist before the trucks leave. |
Migration questions are arithmetic + matching: bandwidth math chooses online vs offline; compatibility features (SQL Agent, cross-DB) choose the database target; 'minimal changes + deadline' chooses Rehost. And modern exams love hybrid twists — the edge requirement (SQL Edge) is how AZ-305 checks you read EVERY requirement.
The handbook in one breath: Govern first, then choose compute by management appetite, storage by data shape, databases by consistency needs, integration by data journey, messaging by message value, identity as the new perimeter, monitoring as your eyes, networks by requirements, backup by RPO/RTO, and migration by math. Below — the official documentation for every module.
Hierarchy (MG → Subscription → RG) decides inheritance. Policy controls WHAT, RBAC controls WHO. Landing Zones make it repeatable.
Choose by management appetite: full control (VM) → managed web (App Service) → event-driven (Functions) → orchestrated containers (AKS) → visual workflows (Logic Apps).
Match storage to data shape: objects→Blob, shares→Files, VM volumes→Disks. Redundancy = which disaster you must survive.
Relational integrity → SQL. Global millisecond scale → Cosmos. Protect data at rest (TDE), in transit (TLS), in use (Always Encrypted).
The journey: ADF collects → Data Lake stores raw → Databricks transforms → Synapse analyzes → Stream Analytics watches live.
Message = valuable parcel (Service Bus). Event = doorbell (Event Grid). Stream = conveyor (Event Hubs). APIM is the one front door; cache what you read often; automate every deployment.
Identity is the new perimeter. B2B = partners, B2C = customers. Managed Identity > Service Principal. Conditional Access + PIM + Access Reviews = Zero Trust in practice.
Metrics for speed, Logs for depth, KQL everywhere. App Insights for apps, Workbooks for stories, ADX for your own telemetry at scale.
Hub-spoke is home. ExpressRoute when internet won't do. The 2×2 matrix (HTTP? Global?) answers every load-balancing question. Defense in depth: DDoS → WAF → Firewall → NSG → Private Endpoints.
RPO/RPO are the answer key. Backup = recover DATA (hours). ASR = keep RUNNING (minutes). Immutable vaults beat ransomware.
CAF stages give the vocabulary. Assess before you move; dependency maps make the waves. Bandwidth math picks online vs offline; compatibility features pick the database target.