Autoscaling using custom CloudWatch metrics

If you have landed on this page via Google search, you already know what autoscaling is and what default metrics can generally trigger it on AWS. For the uninitiated, autoscaling allows you scale out your infrastructure automatically, helps you maintain quality during peak loads by adding more compute and also helps you absorb DDoS attacks. AWS offers enough metrics for scaling up and down. You can scale up by SQS queue depth, average CPU utilization etc., but what if your case if little more specific? What if you need to scale based on custom metrics? This articles explains autoscaling of Linux machines based on custom metrics.

Aws-mon-linux to your rescue. Its a bash script that reports custom metric data about Linux performance to Amazon CloudWatch. The script is tested and works on Amazon Linux, RHEL and Ubuntu.

This script can report the following metrics:

  • load average
  • interrupt
  • context switch
  • cpu (user/system/idle/wait/steal)
  • memory
  • swap
  • disk

If you already have an AWS Identity and Access Management (IAM) role associated with your instance, make sure that it has permissions to perform the following operations:
cloudwatch:PutMetricData, cloudwatch:GetMetricStatistics, cloudwatch:ListMetrics, ec2:DescribeTag. Otherwise, you can create a new IAM role with permissions to perform CloudWatch operations and associate that role when you launch a new instance. For more information, see Controlling User Access to Your AWS Account. Optionally, If you aren't using an IAM role, Create a new IAM user or use existing IAM user with above cloudwatch permissions or give Full cloudwatch permission and then update the awscreds.template file with your access key and secret key that downloaded while creating IAM user. The content of this file should use the following format:

AWSAccessKeyId=YourAccessKeyID  
AWSSecretKey=YourSecretAccessKey  

Setup

This script requires AWS Commandline Interface, so make sure you set it up. Assuming you have Python and Pip, its a simple command:

pip install awscli  

Next, Download aws-mon.sh or clone repository using following steps:

git clone https://github.com/moomindani/aws-mon-linux.git  
cd aws-mon-linux  
./aws-mon.sh --help

Perform a simple test run without posting data to CloudWatch. Run the following command for verify the memory utilization

 ./aws-mon.sh --mem-util --verify --verbose

Collecting various metrics and sending them to CloudWatch:

#For CPU metrics
./aws-mon.sh --cpu-us --cpu-sy --cpu-id --cpu-wa --cpu-st

#For memory metrics
./aws-mon.sh --mem-util --mem-used --mem-avail

#For disk metrics
./aws-mon.sh --disk-space-util --disk-space-used --disk-space-avail --disk-path /

#For load average metrics
./aws-mon.sh --load-ave1 --load-ave5 --load-ave15

You should be able to see the metrics now in the cloudwatch dashboard.

Next, Schedule a cron job to report metrics to CloudWatch

crontab -e  
#Add the following command to report all items to CloudWatch every five minutes in crontab
*/5 * * * *  ~/aws-mon-linux/aws-mon.sh --all-items --disk-path=/ --from-cron
#If you want to send specific metric to cloudwatch, instead of  “--all-item” you can specify one metric. Eg. LoadAvg

*/2 * * * *  ~/aws-mon-linux/aws-mon.sh --load-ave1 --load-ave5 --load-ave15 --disk-path=/ --from-cron --from-cron

Once this done restart the crond service - service crond restart. You can now see the custom metrics in the dashboard:

Creating Autoscaling Policy using Custom Metrics

For the sake of this post, I am going to scale by load average, 15 min.

In Autoscale action, Type the name and description and set the threshold and set “state is ALARM. In the “Send notification” select the topic that you want to send notification. Click “Create Alarm”. Now we can assing this alarm to Autoscaling Policy.

Go to Autoscaling groups in Ec2 dashboard.Click on scaling Policies and click on Add policy.
Set the name and select alarm of the custom metric which is created in cloud watch. We created the two autoscaling policies One is for High load average and another one is for Low load average

This is just one example and you can play around with rest of the options. Hope this was helpful. Happy Autoscaling! :)

Suganya, Cloud Support Engineer at Powerupcloud has contributed to this post

comments powered by Disqus