Most Python code is single-threaded and runs on one machine. ML workloads outgrew single machines years ago. Ray is the framework that makes distributed Python feel as natural as single-machine Python - and it has become the default infrastructure layer for serious ML teams.

What Ray Is

Ray is an open-source distributed computing framework from UC Berkeley’s RISELab (now maintained by Anyscale). The core idea: make a Python function run on any machine in a cluster as easily as it runs locally.

import ray

ray.init()

@ray.remote
def train_model(config: dict) -> float:
    # This runs on any available worker
    model = build_model(config)
    return evaluate(model)

# Launch 100 training jobs in parallel
futures = [train_model.remote(config) for config in configs]
results = ray.get(futures)

The @ray.remote decorator is the entire API surface for basic parallelism. The function runs on a remote worker. ray.get collects results. Local development uses the same code - Ray detects a single-machine setup and runs tasks locally.

The ML Stack Built on Ray

Ray’s core distributed primitives power a higher-level ML toolkit:

Ray Train handles distributed model training across multiple GPUs or machines. It integrates with PyTorch, TensorFlow, and Hugging Face Transformers.

from ray.train.torch import TorchTrainer
from ray.train import ScalingConfig

trainer = TorchTrainer(
    train_loop_per_worker=train_func,
    scaling_config=ScalingConfig(
        num_workers=8,
        use_gpu=True,
        resources_per_worker={"GPU": 1}
    )
)
result = trainer.fit()

Eight GPU workers, one configuration object. The distributed training boilerplate - gradient synchronization, process groups, data sharding - is handled by Ray.

Ray Tune is a hyperparameter optimization library. It launches hundreds of trials in parallel, uses early stopping to kill bad runs, and integrates with Optuna, HyperOpt, and Bayesian optimization methods.

from ray import tune

analysis = tune.run(
    train_model,
    config={
        "lr": tune.loguniform(1e-4, 1e-1),
        "batch_size": tune.choice([32, 64, 128]),
        "layers": tune.randint(2, 8)
    },
    num_samples=100,
    scheduler=tune.schedulers.ASHAScheduler()
)

ASHA (Asynchronous Successive Halving) terminates poor performers early. 100 trials with early stopping often performs better than 20 trials run to completion.

Ray Serve deploys ML models as scalable online endpoints with automatic batching, multiple models, and request routing.

Who Uses It and Why

OpenAI used Ray for training infrastructure before building their own systems. Spotify uses Ray for ML platform workloads. Instacart, Shopify, and dozens of other companies run Ray for batch processing and ML pipelines.

The adoption pattern is consistent: teams start using Ray because they need to parallelize something. They discover it handles their ML training workflow. They expand to use Ray Serve for deployment. Eventually Ray becomes the common infrastructure across data engineering and ML.

Comparison With Alternatives

Use case Ray Alternative
Distributed training Ray Train PyTorch DDP, DeepSpeed
Hyperparameter search Ray Tune Optuna, HyperOpt (single node)
Batch data processing Ray Data Spark, Dask
Model serving Ray Serve Triton, BentoML
Task parallelism Ray Core Celery, Dask

PyTorch Distributed (DDP) is lower-level and gives more control. For teams that need maximum flexibility in their training loop, DDP is better. For teams that want to parallelize training without becoming distributed systems experts, Ray Train is the right abstraction.

Spark is battle-tested for data processing but the Python API feels unnatural and the JVM overhead is significant. Ray Data is more Pythonic and faster for Python-native workloads.

The Scheduling Model

Ray’s scheduler is fundamentally different from a job queue. Each task declares its resource requirements:

@ray.remote(num_gpus=1, num_cpus=4)
def gpu_training_task():
    ...

@ray.remote(num_cpus=2, memory=4 * 1024 * 1024 * 1024)
def preprocessing_task():
    ...

The scheduler packs tasks onto nodes based on available resources. When a GPU node is free, GPU tasks run. When only CPU nodes are available, CPU tasks fill them. Resource-aware scheduling means you do not overprovision for the worst case.

The Actor Model for Stateful Work

Tasks are stateless. Actors are stateful objects that live on a worker and can receive method calls.

@ray.remote
class ModelServer:
    def __init__(self, model_path: str):
        self.model = load_model(model_path)

    def predict(self, inputs: list) -> list:
        return self.model(inputs)

server = ModelServer.remote("model.pt")
predictions = ray.get([server.predict.remote(batch) for batch in batches])

The actor lives on a worker, holds the loaded model in memory, and receives batches. This is the pattern Ray Serve uses internally for model deployment.

Bottom Line

Ray is the distributed computing layer that makes Python scale from one machine to thousands without rewriting your code. For ML specifically, Ray Train, Tune, Data, and Serve form a complete stack from data preprocessing through training to deployment. The adoption at major ML shops is not accidental - the programming model genuinely reduces the complexity of distributed ML work, and the resource-aware scheduling is more efficient than simple job queues. If you are running serious ML workloads and are not already evaluating Ray, you should be.