In this paper, we consider an application provider that executes simultaneously periodic long running jobs and needs to ensure a minimum throughput to guarantee QoS to its users; the application provider uses virtual machine (VM) resources offered by an IaaS provider. Aim of the periodic jobs is to compute measures on data collected over a specific time frame. We assume that the IaaS provider offers a pay for only what you use scheme similar to the Amazon EC2 service, comprising on demand and spot VM instances. The former are sold at a fixed price, while the latter are assigned on the basis of an auction.
We focus on the bidding decision process by the application provider and model the bidding problem as a Q-Learning problem, taking into account the workloads, the maximum completion times since jobs start, the last checkpoint, and the past spot prices observed. In Q-Learning, a form of model-free Reinforcement Learning, the player is repeatedly faced with a choice among N different actions, which will determine immediate rewards or costs and will influence future evolutions. Through numerical experiments, we analyze the resulting bidding strategy under different scenarios. Our results show the application provider ability to refine its behavior and to determine the best action so to minimize the average cost per job, also taking into account checkpointing issues and QoS constraints.