Automating RDS snapshots with AWS Lambda

One of our customers uses RDS for their daily ETL. Data in large volumes is pulled from on-premise cassandra clusters and other sources, data gets cleansed and stored in RDS before it gets pumped to Google's BigQuery for analysis. Since the ETL database doesn't have to run all the time, we had to automate terminating and restoring the next day from snapshots to save costs. This explains two scenarios -

  • Automate terminating and restoring using Lambda function
  • Automate cross regional RDS snapshot copy using Lambda fucntions

The second scenario is especially useful for making sure you have snapshots that can be restored in the event of an entire region going down. Well, that doesn't happen often but still its better to assume that a region will go down. In Werner Vogels's words Remember, everything fails all the time :)

Create IAM Role

Lets get started with creating a role which we can use for both the scenarios. If you came here looking for Lambda, I assume you already know how to create IAM. If not, here is how

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        },
        {
            "Action": [
                "rds:AddTagsToResource",
                "rds:CopyDBSnapshot",
                "rds:CopyDBClusterSnapshot",
                "rds:DeleteDBInstance",
                "rds:DeleteDBSnapshot",
                "rds:RestoreDBInstanceFromDBSnapshot",
                "rds:Describe*",
                "rds:ListTagsForResource"
            ],
            "Effect": "Allow",
            "Resource": "*"
        }
    ]
}

Delete RDS with final snapshot - Lambda

Go ahead and create a function to delete the RDS instance. We need to make sure we have the final snapshot before deleting so that we can restore before the next ETL run.

We wrote a detailed step by step on how to create a lambda function in the recent past. Check it out [here]http://blog.powerupcloud.com/2016/02/15/automate-ebs-snapshots-using-lambda-function/)

Below is the function itself:

import boto3  
import datetime  
import time  
import sys

db_instance='instance01'  
region='us-east-1'

def lambda_handler(event, context):  
    try: 
        date=time.strftime("-%d-%m-%Y")
        snapshot_name = db_instance+date
        source = boto3.client('rds', region_name=region)
        global db_instance
        source.delete_db_instance(DBInstanceIdentifier=db_instance,SkipFinalSnapshot=False,FinalDBSnapshotIdentifier=snapshot_name)
    except Exception as e:
        raise e
    print '[main] End'

Make sure the value of Handler is set to lambda_function.lambda_handler. Select the lambda_RDS_Role from the options in the Role field. We already have a database instance instance01 in us-east-1 region. This lambda function will delete that instance while creating a final snapshot of it and the name of the created snapshot will include the date on which snapshot is created. The below screenshots shows the results of triggering this lambda function.
Take a look at the name of the snapshot creating.

Restoring Snapshot

To restore the database instance from a snapshot, below is a sample python script which takes the latest snapshots of the specified instances and restore the instances from them.

import boto3  
import botocore  
import datetime  
import re  
import logging

region='us-east-1'  
db_instance_class='db.t2.micro'  
db_subnet='default'  
instances = ['instance01', 'instance02']

print('Loading function')

def byTimestamp(snap):  
  if 'SnapshotCreateTime' in snap:
    return datetime.datetime.isoformat(snap['SnapshotCreateTime'])
  else:
    return datetime.datetime.isoformat(datetime.datetime.now())

def lambda_handler(event, context):  
    source = boto3.client('rds', region_name=region)
    for instance in instances:
        try:
            source_snaps = source.describe_db_snapshots(DBInstanceIdentifier = instance)['DBSnapshots']
            print "DB_Snapshots:", source_snaps
            source_snap = sorted(source_snaps, key=byTimestamp, reverse=True)[0]['DBSnapshotIdentifier']
            snap_id = (re.sub( '-\d\d-\d\d-\d\d\d\d ?', '', source_snap))
            print('Will restore %s to %s' % (source_snap, snap_id))
            response = source.restore_db_instance_from_db_snapshot(DBInstanceIdentifier=snap_id,                                                        DBSnapshotIdentifier=source_snap DBInstanceClass=db_instance_class, DBSubnetGroupName=db_subnet,MultiAZ=False,PubliclyAccessible=True)
            print(response)

        except botocore.exceptions.ClientError as e:
            raise Exception("Could not restore: %s" % e)

Here, the restore_db_instance_from_db_snapshot() requires db_instance_class and db_subnet to be defined, so we defined both as constants. The instances list contains the instance identifiers for which snapshots should be considered for restoration. Using this function, the name of the restored instance will be the name of its database snapshot without the date included in it. ( For example, if we have snapshot name instance01-22-03-2016, the name of the restored instance will be instance01).

Copy snapshots of ALL instances to a different region

Here’s a sample python script for copying the latest snapshots of all Database Instances available in us-east-1 to us-west-2 region. (The regions can be changed according to the requirement):

import boto3  
import botocore  
import datetime  
import re  
import logging

SOURCE_REGION = 'us-east-1'  
TARGET_REGION = 'us-west-2'  
iam = boto3.client('iam')

print('Loading function')

def byTimestamp(snap):  
  if 'SnapshotCreateTime' in snap:
    return datetime.datetime.isoformat(snap['SnapshotCreateTime'])
  else:
    return datetime.datetime.isoformat(datetime.datetime.now())

def lambda_handler(event, context):  
    account_ids = []
    try:
        iam.get_user()
    except Exception as e:
        account_ids.append(re.search(r'(arn:aws:sts::)([0-9]+)', str(e)).groups()[1])
        account = account_ids[0]

    source = boto3.client('rds', region_name=SOURCE_REGION)
    source_instances = source.describe_db_instances()['DBInstances']

    for instance in source_instances:
        db_instances = "{0}".format(instance['DBInstanceIdentifier'])
        print "DB_Instance:", db_instances
        source_snaps = source.describe_db_snapshots(DBInstanceIdentifier=db_instances)['DBSnapshots']
        source_snap = sorted(source_snaps, key=byTimestamp, reverse=True)[0]['DBSnapshotIdentifier']
        source_snap_arn = 'arn:aws:rds:%s:%s:snapshot:%s' % (SOURCE_REGION, account, source_snap)
        target_snap_id = (re.sub('rds:', '', source_snap))
        print('Will Copy %s to %s' % (source_snap_arn, target_snap_id))
        target = boto3.client('rds', region_name=TARGET_REGION)
        try:
            response = target.copy_db_snapshot(
            SourceDBSnapshotIdentifier=source_snap_arn,
            TargetDBSnapshotIdentifier=target_snap_id,
            CopyTags = True)
            print(response)
        except botocore.exceptions.ClientError as e:
            raise Exception("Could not issue copy command: %s" % e)
        copied_snaps = target.describe_db_snapshots(SnapshotType='manual', DBInstanceIdentifier=db_instances)['DBSnapshots']

In this script, copy_db_snapshot() is used to copy the snapshots from SOURCE_REGION to TARGET_REGION defined as constants. describe_db_instance() describes all the instances available in SOURCE_REGION. sorted() is used to rearrange the database snapshots according to their timestamps so that only the above one can be taken for copying. describe_db_snapshots() describes the database snapshots being copied to the target region.

Copying snapshots of specific instance(s) to different region:

Below is a sample script which will copy the latest snapshot of only two instances as specified rather than taking snapshots of all the instances available in source region:

import boto3  
import botocore  
import datetime  
import re

SOURCE_REGION = 'us-east-1'  
TARGET_REGION = 'us-west-2'  
iam = boto3.client('iam')  
instances = ['instance01', 'instance02']

print('Loading function')

def byTimestamp(snap):  
  if 'SnapshotCreateTime' in snap:
    return datetime.datetime.isoformat(snap['SnapshotCreateTime'])
  else:
    return datetime.datetime.isoformat(datetime.datetime.now())

def lambda_handler(event, context):  
    account_ids = []
    try:
        iam.get_user()
    except Exception as e:
        account_ids.append(re.search(r'(arn:aws:sts::)([0-9]+)', str(e)).groups()[1])
        account = account_ids[0]

    source = boto3.client('rds', region_name=SOURCE_REGION)

    for instance in instances:
        source_instances = source.describe_db_instances(DBInstanceIdentifier= instance)
        source_snaps = source.describe_db_snapshots(DBInstanceIdentifier=instance)['DBSnapshots']
        source_snap = sorted(source_snaps, key=byTimestamp, reverse=True)[0]['DBSnapshotIdentifier']
        source_snap_arn = 'arn:aws:rds:%s:%s:snapshot:%s' % (SOURCE_REGION, account, source_snap)
        target_snap_id = (re.sub('rds:', '', source_snap))
        print('Will Copy %s to %s' % (source_snap_arn, target_snap_id))
        target = boto3.client('rds', region_name=TARGET_REGION)

        try:
            response = target.copy_db_snapshot(
            SourceDBSnapshotIdentifier=source_snap_arn,
            TargetDBSnapshotIdentifier=target_snap_id,
            CopyTags = True)
            print(response)
        except botocore.exceptions.ClientError as e:
            raise Exception("Could not issue copy command: %s" % e)
        copied_snaps = target.describe_db_snapshots(SnapshotType='manual', DBInstanceIdentifier=instance)['DBSnapshots']

Here, we have defined two constants for SOURCEREGION & TARGETREGION and a list “instances” which contains the database identifiers for which the associated snapshots to be copied to target region. A for loop is applied to the “instances” list to copy snapshots one by one until the instances list contains a database identifier.

Hope this was useful. Happy snapshotting! :)

Priyanka Sharma

Priyanka is Senior Cloud and DevOps Engineer. She can churn out CloudFormation templates at a moment's notice and play with Chef/Ansible. Dancing, music, badminton and word games are her hobbies

comments powered by Disqus