Two-Site Stretched Azure Local Cluster
Designed and built a two-site stretched Azure Local (Azure Stack HCI) reference environment — storage-replica site-to-site replication plus Azure Arc management — demonstrating edge infrastructure that survives the loss of a full site.
- Role
- Cloud Engineer
- Org
- Dell Technologies
- Period
- 2021–2022
Context
Enterprises running operational-technology workloads at the edge often can’t go cloud-only: latency, data-gravity, and continuity-of-operations requirements mean the workloads have to run close to the site. This was a demonstration / reference environment built to prove out a pattern for exactly that case — infrastructure that can lose an entire physical location and keep running.
This is a reference design, not a production customer system. All addressing, host naming, and identifiers are illustrative; the architecture is described in generic terms.
Approach
The design is a stretched Azure Local cluster spanning two sites — a primary and an asynchronous DR location — fronted by redundant top-of-rack switching and managed through Azure Arc.
- Two active sites, each running a Storage Spaces Direct pool as a two-way mirror, so each location is independently fault-tolerant.
- Storage-replica, site-to-site replication between the pools, giving a recoverable copy of data at the second site without depending on the primary.
- Segmented networking — separate management, storage, and replication paths over redundant NICs and TOR switches — so storage and replication traffic never competes with management.
- Azure Arc projection to bring the on-prem cluster under cloud governance: Azure Monitor for telemetry, Recovery Services vaults for backup, and a single control plane for security posture.
- A cloud witness for quorum, removing the need for a third physical site to arbitrate failover.
Outcome
A working, repeatable blueprint for operations-grade hybrid infrastructure that tolerates the loss of a full site while staying centrally observable and governed from Azure. The same patterns — cluster bring-up, Network ATC intents, lifecycle management — are captured as public hands-on labs so other engineers can stand up equivalent environments.
If you want to add specifics you can speak to — hardware footprint, recovery objectives the design targets — drop them here.