EECS 485 Lab

Lab 9: Scaling Static Pages and Uploads with PaaS

Goals

Restarting this Tutorial

  1. To delete IAM user, go to IAM console, click on Users –> Check box next to user –> Delete (and accept every warning)
  2. To delete IAM group, click on groups, click on the checkbox next to group, click on group actions, and press delete
  3. To revert code changes use git reset
     $ pwd
     /Users/awdeorio/src/eecs485/p3-insta485-clientside/
     $ git checkout aws-uniqname
     M       ...
     Already on 'aws-uniqname'
     Your branch is up to date with 'origin/aws-uniqname'.
     $ git branch # Verify that you are on the aws-uniqname branch
     ​* aws-uniqname
     main
     $ git reset --hard # Reverts all code changes back to your previous commit
     HEAD is now at ... [previous commit message]
    
  4. To remove all S3 changes use aws s3 rb
     $ aws s3 ls
     2020-08-24 20:50:46 uniqname.static.insta485.com
     2020-08-24 19:16:53 uniqname.uploads.insta485.com
     2020-08-24 15:03:57 logs.uniqname.com
     $ aws s3 rb s3://uniqname.static.insta485.com --force  
     $ aws s3 rb s3://uniqname.uploads.insta485.com --force  
     $ aws s3 rb s3://log.uniqname.com --force  
     $ aws s3 ls # Verify that those buckets do not exist anymore
    
  5. To delete and disable your CDN distribution, navigate to the CloudFront console, click on your CDN distribution, and click on disable. Once it is disabled, you will be able to delete it as well.

Setup AWS

Follow the steps in the AWS tutorial to create an AWS Educate account.

Create IAM User

  1. Navigate to the AWS Management Console. Select the “Services” dropdown menu, then “IAM”. AWS Identity and Access Management (IAM) allows you to securely control who is authenticated (signed in) and authorized (has permissions) to use your AWS resources.

  2. Click on Users on the side panel and select Add User.

  3. Make sure you select Programmatic access and AWS Management Console access so that we can access AWS from our local command line. Also select Custom password for the console password, enter a password of your choosing, and select Require password reset. Then, click on Next: Permissions.

  4. Select Add user to group and then Create group.

  5. For Group Name, enter “Administrators” and select the AdministratorAccess policy (gives members of this group full access to AWS resources). Then, click on Create group.

  6. Click on Next: Tags.

  7. Click on Next: Review.

  8. Click on Create user.

  9. NOTE: Before you leave the page after successfully creating the IAM User, make sure to either download the .csv, which will contain the Access Key ID and Secret access key, or store them in a different way. You will not have access to the secret access key again after this page closes.

  10. You should now see your IAM User and Group on the IAM console dashboard.

AWS CLI Setup

Next, we are going to install and setup the AWS Command Line Interface (CLI).

Install AWS CLI

For Linux/WSL:

$ sudo apt-get update
$ sudo apt-get install awscli

For MacOS:

$ brew install awscli

Verify that the installation was successful:

$ aws --version
aws-cli/2.0.38 Python/3.7.4 Darwin/19.6.0 exe/x86_64

Configure AWS CLI

Now we need to connect our AWS CLI with the IAM User and permissions to access the appropriate AWS resources.

AWS Access Key ID: Enter you AWS Access Key ID that you got from the IAM setup (in new_user_credentials.csv) AWS Secret Access Key: Enter you AWS Secret Access Key that you got from the IAM setup (in new_user_credentials.csv) Default region name: Enter us-east-2
Default output format: Enter json

$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-east-2
Default output format [None]: json

This configurations gets saved by default to ~/.aws/config and your credentials get saved by default to ~/.aws/credentials. Verify that your CLI is configured correctly.

$ aws configure list
      Name                    Value             Type    Location
      ----                    -----             ----    --------
   profile                <not set>             None    None
access_key     ****************MPLE shared-credentials-file    
secret_key     ****************EKEY shared-credentials-file    
    region                us-east-2      config-file    ~/.aws/config
$ cat ~/.aws/config
[default]
region = us-east-2
output = json
$ cat ~/.aws/credentials
[default]
aws_access_key_id = AKIAIOSFODNN7EXAMPLE
aws_secret_access_key = wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

Now that your AWS CLI is set up, you should be all set to work with your AWS resources directly from your command line!

Check Account Balance

If you have AWS Credits from the AWS Educate account, click your account username in the top right –> My Account –> Credits to view the remaining credits on your account.

Convert P3 Static file server to AWS S3 with CDN

In this lab, we will be converting our P3 static file server to upload files to AWS S3 and serve those files using AWS S3 and CloudFront (the AWS CDN offering). The reason we do this is so that some of the work of serving static files (which are constant for everyone) is offloaded from our own server to the AWS servers to reduce the latency of our applications. Essentially, we are letting our own server focus on the flask computations and letting AWS deal with the static files.

At a high level, we will:

Create bucket

First, we need to create an AWS S3 bucket to host our website’s static resources.

Before we create the S3 bucket for our static resources, we need to create one to host our server logs. Feel free to name it whatever you want or logs.uniqname.com. You can accept all the default settings here.

Follow the Lab 2 Deploying Static Pages tutorial to host your project 3 website. Feel free to name your root bucket whatever you want or you can use uniqname.static.insta485.com. Skip the step where you create and upload your own index.html file and all the steps dealing with the subdomain bucket. Be sure to enable static website hosting, public access, and logging for your root bucket (using the log bucket to host the logs) and attach the bucket policy below with the name of your root domain bucket name.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "PublicReadGetObject",
            "Effect": "Allow",
            "Principal": "*",
            "Action": [
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::uniqname.static.insta485.com/*"
            ]
        }
    ]
}

At this point, you should be able to see your bucket from the CLI:

$ aws s3 ls
...
2020-08-24 22:37:13 uniqname.static.insta485.com
2020-08-24 14:50:55 logs.uniqname.com
...

Flask and S3 Configuration

Now that we have our AWS S3 bucket set up, we need to set up our project 3 flask configuration to serve all your Flask application’s static assets from Amazon S3, without having to modify your templates. To do this, we will use the flask_s3 library.

Add this to the install_requires array in your project 3 setup.py:

'Flask-S3==0.3.3'

Then, in your insta485/__init__.py, import flask_s3 and add the following:

# Set up flask-s3, which serves static files from AWS S3 in production
s3 = flask_s3.FlaskS3(app)

Add the following to your project 3 config.py:

# AWS S3 static files
# https://flask-s3.readthedocs.io/en/latest/
FLASKS3_DEBUG = True # Enables Flask-S3's debug mode
FLASKS3_ACTIVE = True # This setting allows you to toggle whether Flask-S3 is active or not
FLASKS3_BUCKET_NAME = "uniqname.static.insta485.com" # Add your own root bucket name here, replacing uniqname with your uniqname
FLASKS3_REGION = "us-east-2" # Sets up the AWS region to host your static assets in 
FLASKS3_FORCE_MIMETYPE = True # Always set the Content-Type header on the S3 files irrespective of gzipping
FLASKS3_USE_HTTPS = False # We will only be using HTTP for now

Upload files

Next, we need to actually upload our static files to S3 for them to be served.

First run npx webpack in your p3 directory to generate the bundled JS file. Then, upload the files in insta485/static/ to your S3 bucket using the below Python code. Note that the flask_s3 uses boto3 under the hood, which in turn uses the AWS credentials stored by the aws CLI utility.

$ pwd
/Users/awdeorio/src/eecs485/p3-insta485-clientside
$ source env/bin/activate # Activate your python virtual environment
$ npx webpack
$ pip install flask_s3
$ python3 
>>> import flask_s3
>>> import insta485
>>> flask_s3.create_all(insta485.app)

Verify that the files were uploaded. You should also be able to see them in the AWS S3 management console.

$ aws s3 ls uniqname.static.insta485.com
                           PRE static/
$ aws s3 ls uniqname.static.insta485.com/static/
                           PRE css/
                           PRE images/
                           PRE js/
$ aws s3 ls uniqname.static.insta485.com/static/css/
2020-06-23 22:28:32       3611 style.css

Verify that you can download a static file. Don’t forget to change the bucket name in this example.

$ curl -L http://uniqname.static.insta485.com.s3.amazonaws.com/static/js/bundle.js

Verify that the Content-Type header is being set correctly. Check the last set of headers (not the redirect). If you see Content-Type: binary/octet-stream, then make sure you set FLASKS3_FORCE_MIMETYPE = True before uploading files.

$ curl -LI http://uniqname.static.insta485.com.s3.amazonaws.com/static/js/bundle.js | less
HTTP/1.1 307 Temporary Redirect
... ignore this one

HTTP/1.1 200 OK
...
Content-Type: application/javascript
...

Test

Now that we have uploaded the static files to S3, static files should be sourced from AWS S3, because of our Flask-S3 configuration.

$ ./bin/insta485run 

Browse to the index page http://localhost:8000 . View source and look for static files, e.g., bundle.js. It should be sourced from S3, e.g.:

<script type="text/javascript" src="http://uniqname.static.insta485.com.s3.amazonaws.com/static/js/bundle.js"></script>

Add a CDN

Amazon CloudFront is a content delivery network (CDN) offered by AWS. CDNs provide a network of globally distributed cached content (generally static content) that is geographically closer to users, which reduces the latency for the download of the content.

  1. Navigate to the AWS CloudFront console https://console.aws.amazon.com/cloudfront/ and choose Create Distribution.

  2. Since we want to speed up distribution of static and dynamic content and distribute media files using HTTP, we want to use CloudFront to create a web distribution. Click on Get Started under the web section.

  3. We want to create our distribution so that it caches the static assets from our static file server that we created earlier with S3. First, in the Origin Settings section, for Origin Domain Name, enter your root domain S3 website endpoint, which generally follows the format domainName.s3-website.us-west-2.amazonaws.com (should be suggested via the dropdown menu). CloudFront will fill in the Origin ID and other Origin Settings for you.

  4. Do not change any of the Default Cache Behavior Settings as they are already optimized for our usage

  5. When a user requests your root website endpoint, we want CloudFront to return our index.html page. In the Distribution Settings section, enter index.html for the Default Root Object and enable logging, enter your logging S3 bucket that was created earlier (should be suggested via the dropdown menu), and enter logs/ for Log Prefix so that the logs are more organized. Then, click Create Distribution.

  6. You should now see your CloudFront distribution in the console. Wait for the status to change to Deployed and then get your CDN domain name. It may take up to 15 minutes or so for the CDN to be deployed. Verify that it has been deployed successfully by browsing to your CDN domain. You should see “This XML file does not appear to have any style information associated with it. The document tree is shown below.”. If you see “This site can’t be reached”, then something went wrong.

See the different IPs when accessing the static file server directly vs. through the CDN.

$ curl -vI http://uniqname.static.insta485.com.s3-website.us-east-2.amazonaws.com/static/js/bundle.js
*   Trying 52.219.84.28...
* TCP_NODELAY set
* Connected to uniqname.static.insta485.com.s3-website.us-east-2.amazonaws.com (52.219.84.28) port 80 (#0)
...
$ curl -vI 
$ curl -vI http://d1xmjw8wl9d2hx.cloudfront.net
*   Trying 99.84.248.133...
* TCP_NODELAY set
* Connected to d1xmjw8wl9d2hx.cloudfront.net (99.84.248.133) port 80 (#0)
...

Configure web server setting to use the CDN

Similar to before, we can use Flask-S3 to work with the CDN we just deployed.

Add FLASKS3_CDN_DOMAIN to the settings in config.py. All flask-s3 setting look this this now:

FLASKS3_DEBUG = True # Enables Flask-S3's debug mode
FLASKS3_ACTIVE = True # This setting allows you to toggle whether Flask-S3 is active or not
FLASKS3_BUCKET_NAME = "uniqname.static.insta485.com" # Add your own root bucket name here
FLASKS3_REGION = "us-east-2" # Sets up the AWS region to host your static assets in 
FLASKS3_FORCE_MIMETYPE = True # Always set the Content-Type header on the S3 files irrespective of gzipping
FLASKS3_USE_HTTPS = False # We will only be using HTTP for now
FLASKS3_CDN_DOMAIN = "d1xmjw8wl9d2hx.cloudfront.net" # Add your own CDN Domain Name here

Browse to the index page http://localhost:8000 . View source and look for static files, e.g., bundle.js. It should be sourced from CloudFront, e.g.:

<script type="text/javascript" src="http://d1xmjw8wl9d2hx.cloudfront.net/static/js/bundle.js"></script>

Then, disable your CDN using the management console. This will take a few minutes.

Convert P3 Uploads to AWS S3

In addition to our static assets, we want the images that are posted to Insta485 to be available to users globally. Locally, we are storing the image files in the var/uploads folder; however, this is not a very scalable solution. For example, if we want to create multiple front-end servers for Insta485 to distribute the computation, each server cannot have its own copy of uploads. Instead, we can once again leverage an AWS S3 bucket to store all of the uploads and our front-end web servers can read and write to that S3 bucket.

Add S3 Support

  1. First, we need to create an S3 bucket for our uploads. Create an S3 bucket in us-east-2 (Ohio) named uniqname.uploads.insta485.com with your uniqname and click “Create” (accept all the default settings). Inside that bucket, create a folder named uploads/ - you can do this by clicking the Create folder button inside the S3 bucket on the console. You do not need to make it a public bucket or attach any permissions.

    You should now be able to see your buckets.

     $ aws s3 ls
     ...
     2020-08-24 14:58:25 achitta.static.insta485.com
     2020-08-24 15:10:48 achitta.uploads.insta485.com
     2020-08-24 15:03:57 logs.achitta.com
     ...
     $ aws s3 ls uniqname.uploads.insta485.com
                             PRE uploads/
    
  2. Now we need to copy all of your local database assets to your S3 bucket. Make sure to use your own uniqname.
     $ aws s3 cp var/uploads/ s3://uniqname.uploads.insta485.com/uploads --recursive
     upload: var/uploads/e1a7c5c32973862ee15173b0259e3efdb6a391af.jpg to s3://achitta.uploads.insta485.com/uploads/e1a7c5c32973862ee15173b0259e3efdb6a391af.jpg
     ...
    
  3. Add the following to the the insta485/config.py file.

     # AWS S3 file upload
     AWS_S3_UPLOAD_BUCKET = "uniqname.uploads.insta485.com" # or your bucket name
     AWS_S3_UPLOAD_REGION = "us-east-2"
     AWS_S3_UPLOAD_FOLDER = "uploads"
    
  4. Add the following imports to your insta485/model.py file before the insta485 import:

     from pathlib import Path
     import uuid
     import tempfile
     import botocore
     import boto3
    
  5. Add this to the install_requires array in setup.py:

     'boto3==1.14.9',
    

Functions for API Routes

The following functions will be written in the model.py folder. Let’s first write the API route for getting uploads from S3.

@insta485.app.route("/uploads/<filename>")
def get_upload(filename):
    """Serve one file from the uploads directory."""
    # In production, download the image from S3 to a temp file and serve it
    if "AWS_S3_UPLOAD_BUCKET" in insta485.app.config:
        s3_client = boto3.client("s3")
        bucket = insta485.app.config["AWS_S3_UPLOAD_BUCKET"]
        key = "{folder}/{filename}".format(
            folder=insta485.app.config.get("AWS_S3_UPLOAD_FOLDER"),
            filename=filename,
        )

        # Download the image to a temporary in-memory file
        # https://docs.python.org/3/library/tempfile.html#tempfile.SpooledTemporaryFile
        # https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.download_fileobj
        tmpfileobj = tempfile.SpooledTemporaryFile()
        try:
            s3_client.download_fileobj(bucket, key, tmpfileobj)
        except botocore.exceptions.ClientError as error:
            LOGGER.error(error)
            flask.abort(400)

        # Serve the file to the user
        # https://flask.palletsprojects.com/en/1.1.x/api/#flask.send_file
        tmpfileobj.seek(0)
        return flask.send_file(tmpfileobj, attachment_filename=filename)

    # In development, send the file directly from the file system
    return flask.send_from_directory(
        insta485.app.config['UPLOAD_FOLDER'],
        filename
    )

Next, let’s write a helper function for if a filename has an allowed extension.

def allowed_file(filename):
    """Return true if filename has allowed extension."""
    extension = Path(filename).suffix
    extension = extension.replace(".", "").lower()
    return extension in insta485.app.config["ALLOWED_EXTENSIONS"]

Now, let’s write the function to save an upload to S3.

def save_upload_to_s3(fileobj, filename):
    """Upload file object to S3.
    This function is used in production for media uploads.
    Docs
    https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-uploading-files.html
    https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
    """
    s3_client = boto3.client("s3")
    bucket = insta485.app.config["AWS_S3_UPLOAD_BUCKET"]
    key = "{folder}/{filename}".format(
        folder=insta485.app.config.get("AWS_S3_UPLOAD_FOLDER"),
        filename=filename,
    )
    try:
        s3_client.upload_fileobj(fileobj, bucket, key)
    except botocore.exceptions.ClientError as error:
        LOGGER.error(error)
        flask.abort(400)
    LOGGER.info("Saved upload to S3 %s/%s", bucket, key)

And we need to write a function to delete a file from S3.

def delete_upload_from_s3(filename):
    """Delete file object from S3.
    This function is used in production for media uploads.
    Docs
    https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-uploading-files.html
    https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
    """
    s3_client = boto3.client("s3")
    bucket = insta485.app.config["AWS_S3_UPLOAD_BUCKET"]
    key = "{folder}/{filename}".format(
        folder=insta485.app.config.get("AWS_S3_UPLOAD_FOLDER"),
        filename=filename,
    )
    try:
        s3_client.delete_object(Bucket=bucket, Key=key)
    except botocore.exceptions.ClientError as error:
        LOGGER.error(error)
        flask.abort(400)
    LOGGER.info("Deleted upload from S3 %s/%s", bucket, key)

In local development, we don’t want to interact with S3 - so let’s write two functions to save and delete uploads using local storage.

def save_upload_to_disk(fileobj, filename):
    """Save file object to on-disk uploads folder.
    This function is used in development for media uplods.
    """
    path = insta485.app.config["UPLOAD_FOLDER"]/filename
    fileobj.save(path)
    LOGGER.info("Saved upload to disk %s", path)


def delete_upload_from_disk(filename):
    """Delete file from on-disk uploads folder.
    This function is used in development for media uplods.
    """
    path = insta485.app.config["UPLOAD_FOLDER"]/filename
    path.unlink()

Finally, let’s use the functions we wrote to create and remove uploads.

def create_upload():
    """Handle one upload POST request.  Return filename of saved file."""
    # User is not logged in
    if "logname" not in flask.session:
        flask.abort(403)

    # Post request has no file part
    if "file" not in flask.request.files:
        flask.abort(400)
    file = flask.request.files["file"]

    # User did not select file
    if file.filename == "":
        flask.abort(400)

    # Disallowed file extension
    if not allowed_file(file.filename):
        flask.abort(400)

    # New filename is a unique ID
    uuid_filename = "{stem}{suffix}".format(
        stem=uuid.uuid4().hex,
        suffix=Path(file.filename).suffix
    )

    # Upload to S3 if the configuration provides AWS_S3_UPLOAD_BUCKET.
    # Typically, this would be set in production.
    if "AWS_S3_UPLOAD_BUCKET" in insta485.app.config:
        save_upload_to_s3(file, uuid_filename)
    else:
        save_upload_to_disk(file, uuid_filename)

    # Return hash_basename for adding to database
    return uuid_filename


def remove_upload(filename):
    """Handle one request to delete a media upload."""
    # Upload to S3 if the configuration provides AWS_S3_UPLOAD_BUCKET.
    # Typically, this would be set in production.
    if "AWS_S3_UPLOAD_BUCKET" in insta485.app.config:
        delete_upload_from_s3(filename)
    else:
        delete_upload_from_disk(filename)

Refactoring views to use new API routes

Now that we’ve written new create_upload() and remove_upload() functions in the model module of insta485, we’ll need to refactor the uses in views/accounts.py and views/user.py.

Everyone will have slightly different code for this. Here’s what you’ll need to do:

Run and validate

View bucket contents with AWS CLI.

List buckets and bucket contents

$ aws s3 ls
...
2020-06-22 11:27:33 uniqname.uploads.insta485.com
$ aws s3 ls uniqname.uploads.insta485.com
                           PRE uploads/

Make sure you have the AWS Python module installed.

$ pip install boto3

Run a dev server.

$ ./bin/insta485run

Log in to the web interface and navigate to a user page, e.g., http://localhost:8000/u/awdeorio/ . Upload an image, creating a new post. You should it the image appear. Check the server logs, you should see:

[2020-06-23 21:52:25,411] INFO in model: Saved upload to S3 uniqname.uploads.insta485.com/uploads/aaaf3b2c2597410ba72fa4a337bc9185.jpg

Cleanup

Note: We are NOT deleting any of the AWS resources that we provisioned.

  1. Disable your CDN by navigating to the CloudFront console, clicking on the checkbox for your distribution, and then clicking on disable.

  2. Block public access to your static S3 file server by navigating to the S3 console, clicking into your static files S3 bucket (i.e. uniqname.static.insta485.com), then clicking on properties -> Static Website Hosting, and then clicking on Disable website hosting and save.

  3. Comment out the line AWS_S3_UPLOAD_BUCKET = "uniqname.uploads.insta485.com" in your insta485/config.py to use local image uploads instead of S3.

  4. SetFLASKS3_ACTIVE to false in your insta485/config.py to serve static files locally instead of from your CDN and S3.

You should now be able to run ./bin/insta485run and everything should still work. Additionally, verify that your the script source on your index.html page is /static/js/bundle.js

Completion Criteria

Lab Quiz

Complete the lab quiz by the due date.