Version: 2604 (Preview)

Machine Learning and AI on Azure Local

Azure Local supports running machine learning inference, AI models, and intelligent edge workloads directly on on-premises infrastructure. By leveraging local GPU and compute resources, organizations can process data in real time at the point of origin — reducing latency, maintaining data sovereignty, and enabling AI in disconnected or bandwidth-constrained environments.

Service Details

What It Enables

ML and AI on Azure Local allows organizations to deploy trained machine learning models to on-premises infrastructure for local inference. Models are trained in the cloud (Azure Machine Learning) or locally, then deployed as containerized scoring endpoints on AKS clusters running on Azure Local. Azure Arc provides unified management across cloud-trained and edge-deployed models.

Key Use Cases

Real-time inference at the edge — Run computer vision, anomaly detection, or predictive maintenance models on factory floors, retail stores, or remote sites where cloud round-trips are too slow
Data sovereignty for AI — Process sensitive data (medical imaging, classified telemetry) through AI models without the data ever leaving the local boundary
Disconnected AI — Run inference in environments with intermittent or no cloud connectivity (military, maritime, mining)
AI-powered IoT — Combine IoT data ingestion with ML inference on the same Azure Local infrastructure for real-time decision-making at the edge

Architecture

Azure Machine Learning + Arc — ML models registered in Azure ML can be deployed to Arc-connected AKS clusters on Azure Local
Inference endpoints — Models run as containerized REST/gRPC endpoints on AKS, served via local network
GPU acceleration — NVIDIA and Intel GPU passthrough or Discrete Device Assignment (DDA) provides hardware acceleration for inference
Data pipeline — Local data sources feed into inference containers; results can be stored locally or synchronized to the cloud when connectivity is available
Model management — Versioning, lineage, and monitoring handled through Azure ML; inference telemetry collected via Azure Arc

Supported Features

Model deployment to Arc-connected Kubernetes clusters
GPU-accelerated inference (NVIDIA CUDA, Intel oneAPI)
ONNX Runtime for cross-framework model execution
Azure Machine Learning integration for model registry and versioning
Batch and real-time inference patterns
Azure Monitor integration for model performance metrics
Support for PyTorch, TensorFlow, scikit-learn, and custom containers

Deployment Notes

Requires AKS on Azure Local with the Azure ML extension installed via Azure Arc
GPU workloads require compatible hardware with DDA or GPU-P configured on cluster nodes
Azure subscription required for Azure ML workspace and Arc integration
Consider CPU-only inference for models that don't require GPU acceleration to simplify hardware requirements

Limitations

Model training at scale typically requires cloud GPU resources; on-premises training is limited to available local hardware
Not all Azure ML features (e.g., AutoML, Designer) are available for Arc-connected deployments
GPU availability and driver compatibility on Azure Local hardware varies by vendor
Hardware and licensing requirements apply

External References

Data verified with Microsoft Azure Local documentation, March 2026.

Service Details​

What It Enables​

Key Use Cases​

Architecture​

Supported Features​

Deployment Notes​

Limitations​

External References​