Friday, July 31, 2015

Deploying a website to Amazon S3

You've generated a great website that you now want to deploy to Amazon S3 to have hosted as a server. I assume you've set up an S3 bucket and configured it for website hosting. Now you have to transfer the files comprising your site to S3, and keep them up to date. You can do this manually with a GUI tool like CrossFTP or S3 Browser. But perhaps you have a few tweaks you'd like to make before you deploy it. Most importantly you'd like to have clients be able to make use of gzip compression when the site is served, since this can greatly speed up site loading, and also reduce your costs. And you can optimize/minify your images for websites. Finally, you need to set all the custom S3 settings you need (Cache-Control headers, Reduced redundancy storage, etc), in one step, reducing requests (and thus costs) and saving lots of time and hassle. And ideally, if you make changes to an existing site, you should only need to upload the files that have changed, again saving time and money.

Here, I present a script that can do all of this, using the convenient example of a generated Gallerific Web Album. This script takes considerably more technical skill relative to generating an album, since you will need to use a command line in the form of a bash shell to use it (which comes with MacOS, Linux, and you can get it with cygwin in Windows, as I use), but it's worth it!

To use it you'll need several tools:
  • s3cmd which is used for copying to/interacting with your S3 bucket. To install:
    • Linux: normally you can get the package as here.
    • Cygwin: get zip from github, run python setup.py install [which requires python-setuptools and to have run easy_install dateutil]
    • Mac via homebrew: brew update && brew install s3cmd
    • In all cases you must then run s3cmd --configure to provide your S3 access keys.
  • mozjpeg v3.0+ which is now the best tool for compressing jpegs for use on the web (available in binary form here). (Be sure that jpegtran/djpeg/cjpeg on your path is this one, and not the default libjpeg implementation.) Lossless compression rates of 60%+ are achieved on thumbnails, and 8-10% on larger images. (It is now as good or better than common online tools and better than things like jpegrescan and adept.sh by my testing.)
  • gzip, or optionally zopfli, for compressing text files as a traditional web server would do (zopfli produces files that are 4-8% smaller than gzip -9. Note that although 7zip also produces smaller files than gzip, it will not work with this script because the timestamp cannot be excluded and so MD5 signatures for files always change)

If you're curious, here is the script (deploy_to_s3):

#!/bin/bash
##################################################
# Deploys a gallerific gallery to the web server
##################################################
# Usage: deploy_to_s3 [directorytodeploy]
##############CONFIG##############################
S3_BUCKET=s3://mys3bucket
REDUCED_REDUNDANCY=-rr
ADD_HOME_LINK=1 # set 'pathtohome/index.html' as applicable
#GZIP_CMD='gzip -n -9'
GZIP_CMD=zopfli
JPEG_QUALITY=80  # lossless or number (recommend 65-85)
#S3CMD_DEBUG="-v --dry-run" # can use --dry-run on s3cmd to see the effects but not execute, also can use -v for verbose
##################################################

GALLERY_DIR=$1
GALLERY_DIR=${GALLERY_DIR%/} # remove trailing slash
if [ -z "$GALLERY_DIR" ] || [ ! -d "$GALLERY_DIR" ] ; then 
  echo Invalid Parameter. USAGE: deploy_to_s3 [directory to deploy] 
  exit
fi
DEPLOY_DIR=$(mktemp -d)
if [ $JPEG_QUALITY == "lossless" ] ; then 
  JPEG_CMD='jpegtran -outfile {}.tmp {}'
  JPEG_CMD2='mv {}.tmp {}'
else
  JPEG_CMD='djpeg -outfile {}.pnm {}' 
  JPEG_CMD2="cjpeg -quality $JPEG_QUALITY -outfile {} {}.pnm"
fi

cp -Rf $GALLERY_DIR $DEPLOY_DIR

if [ $ADD_HOME_LINK -eq 1 ]; then
  echo ADDING CUSTOM HOME LINK TO INDEX.HTML ...
  sed -i '/<div class="header"[^<]*>/ a <p><a style="color: #548BBF !important;" href="pathtohome/index.html">Home</a></p>' \
      $DEPLOY_DIR/$GALLERY_DIR/index.html 
fi

# compress text files and mark them for direct download by browsers using gzip encoding 
echo COMPRESSING TEXT FILES ...
find $DEPLOY_DIR -regex '.*\.\(html\|js\|css\|xml\|svg\|txt\)' -exec $GZIP_CMD {} \; -exec mv {}.gz {} \;
echo UPLOADING COMPRESSED TEXT FILES ...
s3cmd sync $S3CMD_DEBUG --acl-public $REDUCED_REDUNDANCY --add-header="Content-Encoding":"gzip" \
              --guess-mime-type --exclude='*' --rinclude='.*\.(html|js|css|xml)' --signature-v2 $DEPLOY_DIR/$GALLERY_DIR $S3_BUCKET/
echo
echo

# re-compress jpegs using mozjpeg encoder (https://github.com/mozilla/mozjpeg, http://mozjpeg.codelove.de/binaries.html)
echo IMPROVING COMPRESSION ON JPGS ...
# Note some EXIF/comments may be lost, particularly using lossy compression
#pushd/popd necessary because mozjpeg for windows doesn't handle roots (eg /tmp)
pushd $DEPLOY_DIR/$GALLERY_DIR
find im -regex '.*\.jpg' -exec $JPEG_CMD \; -exec $JPEG_CMD2 \;
find im -regex '.*\.pnm' -exec rm -f {} \; # clean up after lossy compression
popd
echo UPLOADING IMAGES AND REMAINING FILES ...
s3cmd sync $S3CMD_DEBUG --acl-public $REDUCED_REDUNDANCY --add-header="Cache-Control":"public,max-age=86400,no-transform" --guess-mime-type \
                --signature-v2 $DEPLOY_DIR/$GALLERY_DIR $S3_BUCKET/
echo
echo

echo REMOVING DELETED FILES ON S3 SIDE CLEANING UP TEMP FILES ...
s3cmd sync $S3CMD_DEBUG --delete-removed --acl-public $DEPLOY_DIR/$GALLERY_DIR/* $S3_BUCKET/$GALLERY_DIR/
rm -rf $DEPLOY_DIR
echo
echo
echo DONE!


To just use it as is, save the file  deploy_to_s3  and make it executable (chmod +x deploy_to_s3). Next you need to configure it:
  1. Set S3_BUCKET to point to your bucket
  2. If you wish to use gzip, uncomment the first GZIP_CMD line and comment the other.
  3. Before you run for real, I'd recommend uncommenting the S3CMD_DEBUG option to see what the effects would be but not execute them.
  4. I like to add a home link at the top of my pages. to do this set ADD_HOME_LINK=1 and change the path to your homepage in the html code (<p>...</p>) in the sed command.
  5. Choose the jpeg compression quality, either "lossless", or an integer typically 65-80.
  6. If you don't want to use reduced redundancy storage, comment out the REDUCED_REDUNDANCY option.
  7. Change what directories get image compression (currently im/ and subdirs)
To run, go to the parent of your gallerific web album's folder. Then simply execute:
deploy_to_s3 [mygallerificwebalbum]

A few notes/features:
  • If you re-generate the album or make tweaks, but only some files have changed, you can re-run deploy_to_s3 and only the files that have changed will be transferred (saving time and money again).
  • Only jpegs in the im directory will be compressed. Other images in gallerific web albums have already been minified. 
    • For use with gallerific, I recommend using lossy compression with this script, combined with very high quality settings when saving JPGs in Lightroom. This will lead to much smaller jpegs than using a lower quality setting in Lightroom combined with lossless compression here. For example, a full size image that I find to be acceptable quality for the web at 65% quality in Lightroom was 827kb. This can be losslessly compressed to 769k. However if the image is saved at 95% quality in Lightroom (effectively lossless) and compressed at 75% using mozjpeg, it occupies only 580kb and has approximately equivalent image quality. The only downside is potentially lost EXIF/comment data with lossy compression (and your local copy will be quite large - a perhaps more reasonable compromise if that is a concern is LR quality 92, mozjpeg 80). [Note that I've learned that LR jpeg quality for the Web is not the same as that used for normal LR export; it appears to be generally lower.]
  • HTTP headers are added so that browsers will recognize and get the gzip format compressed text files over the internet and use them as normal; you and users save significant time and bandwidth and the page loads faster.
  • The Cache-Control HTTP header is set to allow browsers (and CDNs/proxies) to cache all non-text (html/css/js/xml) files for 24 days, dramatically speeding reducing HTTP requests and subsequent views. But be aware that if you make changes to these binary files, users with cached files may not see them unless you rename the files.
Although it does take a bit of work to set it up, using this tool should save you and users enough time to enjoy a choice beverage, which you now deserve. So do!

No comments:

Post a Comment