How to Enable Autoscaling in Google Cloud?



09 May 2019

Introduction of Google Cloud Autoscaling

Google Cloud platform offers autoscaling capabilities that allow us to automatically add or delete instances from a managed instance group based on increase or decrease in the load. Google Cloud Autoscaling helps our applications to gracefully handle the increased traffic and reduces the cost when the need for resources is lower. We just elucidate the autoscaling policy and the autoscaler performs the automatic scaling, based on the measured load.

Fundamental concepts for Autoscaling

Autoscaling uses the following fundamental concepts and services.

  1. Instance Templates:

    An instance template is a resource which is used to create VM instances and managed instance groups. It specifies the machine type, boot disk image or container image, labels, and other instance properties. We can then use an instance template to create a managed instance group or individual instances. Instance templates are a convenient way to save an instance’s configuration so we can use it later to create new instances or groups of instances. It creates the VM instances with identical configurations.

  2. Managed instance groups:

    A managed instance group is a group of homogeneous instances, created from an instance template. An autoscaler adds or removes instances from a managed instance group based on the scaling policy. Although GCP Compute Engine has both managed and unmanaged instance groups, only managed instance groups can be used for Google Cloud Autoscaling.

  3. Autoscaling policy and target utilization:

    To create an autoscaler, we have to specify the autoscaling policy as well as a target utilization level, which the autoscaler uses to determine when to scale the group.

We can choose to scale using the following policies

  • Average CPU utilization
  • HTTP load balancing serving capacity, which can be based on either utilization or requests per second.
  • Stackdriver Monitoring metricsThe autoscaler collects information based on the policy. Then it will compare it to the desired target utilization, and determine if it needs to perform scaling.

Types of Google Cloud Autoscaling

  1. Scaling based on CPU utilization:

    We can autoscale based on the average CPU utilization of a managed instance group. The autoscaler collects the CPU utilization of the instances in the group and determine whether it needs to scale. We set the target CPU utilization the autoscaler should maintain and the autoscaler will work to maintain that level.

    The autoscaler calculates the target CPU utilization level as a fraction of the average use of all vCPUs over time in the instance group. If the average usage of your total vCPUs is more than the target utilization, the autoscaler will add more virtual machines. For example, If we set target utilization as 0.75, autoscaler tries to maintain an average usage of 75% among all vCPUs in the instance group.

  2. Scaling based on load balancing serving capacity:

    Compute Engine provides support for load balancing within instance groups. We can use autoscaling in with load balancing by setting up an autoscaler that scales based on the load of the instances.

    A load balancer spreads load across backend services, which distributes traffic among instance groups. At the backend service, we can define the load balancing capacity of the instance groups as maximum CPU utilization, maximum requests per second (RPS), or maximum requests per second of the group. When an instance group reaches the serving capacity, the backend service will start sending traffic to another instance group.

    When we attach an autoscaler to the load balancer, the autoscaler will scale the managed instance group to maintain a fraction of the load balancing serving capacity.

    For example, assume the load balancing serving capacity of a managed instance group is 100 RPS per instance and we have created an autoscaling with the load balancing policy to maintain a target utilization level at 0.8 or 80%. Then the autoscaler will add or remove instances from the managed instance group in order to maintain 80 RPS per instance.

  3.  Scaling Based on Stackdriver Monitoring Metrics:

    We can setup the auto scaling based on the metrics. The metrics can be either standard metrics provided by the Stackdriver Monitoring service, or custom Stackdriver Monitoring metrics that we create. We can define autoscaling using the stackdriver metrics in two ways-

    • Per-instance metrics: It provides data for each instance in a group separately. These metrics provide data for each instance in the managed instance group. The instance group cannot scale below a size of 1 instance because the autoscaler requires metrics from at least one running instance in order to operate.
    • Per-group metrics:Per-group metrics allows autoscaling with a standard or custom metric that does not use per-instance utilization data. Instead, the instance group scales based on a value, that applies to the whole group and corresponds to how much work is available for the group or how busy the group is. This group scales based on the fluctuation of that group metric value and the configuration that we define.

Request a quote