Is my workload stateless and has more than one replica?
If so, this is usually a good candidate for spot instances. K8s was designed for stateless architecture and there are some typical workloads that work well in spot instances:
- Batch processing jobs – they’re fault-tolerant and instance-flexible.
- Microservices – they’re typically self-contained, highly available, fault-tolerant, and capable of handling interruptions.
- High Performance Computing (HPC) – these apps usually need massive compute capabilities, massive amounts of memory, fast storage, and high network performance. Spot instances can support them via bursting or even serve as primary compute infrastructure.
- CI/CD operations – it doesn’t matter which tools you use; these instances can come in handy in your deployment process.
- Distributed databases – Elasticsearch or MongoDB can handle an interruption without losing any data or affecting the service.
Is my workload fault-tolerant?
Your workload should finish all the work reasonably quickly to be a good spot instance candidate. Consider if an interruption would cause undesired effects, like losing meaningful progress on your work.
If it can’t meet these criteria, steer clear of spot instances. If a VM goes down, your workload will go down with it.
Can I move my workload to another instance gracefully before the time runs out?
Cloud providers offer short interruption notices. AWS gives you 2 minutes, Azure and Google only 30 seconds.
Is that enough time to find a replacement for your instance? Not for a human.
Let’s say that you’ve already set your eyes on an on-demand instance. Creating a new VM takes around 5 minutes on AWS (even more if you use Kubernetes), so you’re looking at a few minutes of potential downtime.