Hosting a Website on S3

I recently took part in a discussion about static site generators like Middleman, Jekyll and Octopress. I mentioned that I was hosting this site on S3, and also doing more advanced stuff, like setting up redirects.

Here’s how I do it. It’s pretty simple.

Requirements

git
s3cmd, then s3cmd --configure
ruby

Installing these on my platform (OSX with Homebrew) is a breeze:

brew install git s3cmd # same with apt-get & yum

A ruby developer will undoubtedly use another method to install the language. For someone who doesn’t mind:

brew install ruby # same with apt-get, maybe not with yum*.

* At least, yum on CentOS 6.4 and older installs an unsupported version of Ruby , which is 1.8.7 (as of summer 2013). YMMV if that’s your ruby is that old.

Deploying to S3 with s3cmd

Setting up the bucket

Create your bucket with the AWS console. s3cmd can do it too, but use the AWS console for hosted sites, because there’s other settings you can only do in the console anyway.

Name your bucket the same as your website’s hostname. E.g.: www.programblings.com
Optional: If you set up logging to another bucket, use a target prefix to keep the logs from your different static sites in different directories. E.g.: Target Prefix: www.programblings.com.
Still from the AWS console, open your bucket’s settings and open the “Static Website Hosting”.
Enable it. Set Index Document: index.html and Error Document: not-found/index.html (or your preferred error page).

Uploading content

The website generator I use outputs to public. To deploy this subdirectory to my bucket:

s3cmd sync --acl-public public/ s3://www.programblings.com

If you’ve already updated your DNS to point to your bucket, navigate there with your browser. If you don’t want to mess with DNS just yet, just go to [yoursite].s3-website-[yourregion].amazonaws.com. In my case: www.programblings.com.s3-website-us-east-1.amazonaws.com (check it out, it works).

You’re online already.

True sync

s3cmd’s syncing doesn’t delete old files by default, though. It only adds. So let’s make sure we delete old stuff by adding --delete to our command:

s3cmd sync --delete --acl-public public/ s3://www.programblings.com

This sync command ensures that obsolete files are purged.

HTTP Caching (or other headers)

Now let’s say that I want to set some caching headers to let my Free CDN (and much more) serve my site faster, all around the world.

I can add --add-header=Cache-Control:public,max-age=300 (5 minutes) to my s3cmd deploy command. Which results in the following mouthful:

s3cmd sync --delete --acl-public public/ s3://www.programblings.com \
  --add-header=Cache-Control:public,max-age=300

My generator’s scripts are provided via rake. So here’s how I set up my deploy task:

task :deploy do
  system  "s3cmd sync --delete --acl-public public/ s3://www.programblings.com" +
          " --add-header=Cache-Control:public,max-age=300"
end

Depending on what tool you use, you may have another task at hand, that can force the rendering of your whole site. In my case, it was another rake task named generate. So my deploy task actually started with

task :deploy => :generate do

Making it safer

Let’s fast forward a few blog posts. You’ve deployed stuff you shouldn’t have a few times already.

A quick way to establish a bit of trust in what you’re deploying is to simply ensure that everything is commited to your git repo already.

In other words, git status --porcelain should return an empty string.

My rake task now look like this:

task :deploy => :generate do
  unless '' == (status = `git status --porcelain`)
    abort "You have unstaged changes. Make sure to commit or stash first.\n\n#{status}"
  end

  system  "s3cmd sync --delete --acl-public public/ s3://www.programblings.com" +
          " --add-header=Cache-Control:public,max-age=300"
end

Redirection

If you’ve migrated your site away from another platform, you may have some old urls that aren’t valid on this platform anymore, but you still want to redirect to a working url.

The AWS documentation being what it is, let me just give a few concrete examples of what you can put in your S3 website’s “Redirection Rules” (found in your bucket’s website hosting section, in the AWS console).

The following example redirects /feed to my current RSS service’s feed, feeds.feedburner.com/Programblings.

<RoutingRules>
  <RoutingRule>
    <Condition>
      <KeyPrefixEquals>feed</KeyPrefixEquals>
    </Condition>
    <Redirect>
      <HostName>feeds.feedburner.com</HostName>
      <ReplaceKeyWith>Programblings</ReplaceKeyWith>
      <HttpRedirectCode>302</HttpRedirectCode>
    </Redirect>
  </RoutingRule>
</RoutingRules>

I have another static site that’s so simple that I haven’t even bothered creating an error page yet. I’ve only overridden the 403 errors. 403 is S3’s default answer to all files who are either not there or not public.

<RoutingRules>
  <RoutingRule>
    <Condition>
      <HttpErrorCodeReturnedEquals>403</HttpErrorCodeReturnedEquals>
    </Condition>
    <Redirect>
      <ReplaceKeyWith>?not-found</ReplaceKeyWith>
      <HttpRedirectCode>302</HttpRedirectCode>
    </Redirect>
  </RoutingRule>
</RoutingRules>

This rule redirects all errors to the main index page, but will also let me dig into my analytics tool’s page views on ?not-found later on, if I’m ever curious.

In your “Redirection Rules”, keep in mind that the root block (RoutingRules) can contain multiple RoutingRule blocks.

Closing notes

Hosting a few static websites on S3 is so cheap it will probably fit in your free tier usage of the service. A no brainer, if you ask me.

If you found my setup interesting, here’s another interesting take. It’s a personal deployment pipeline. I lets its author deploy to a pre-prod site. It also lets him promote the preprod site’s content to production. But it doesn’t let him push straight to production. The article is Octopress Deployment Pipeline.

Oh, one last thing while we’re there: I’ve never actually gone back to my logs stored on S3 ;-)