Serving a static website from bucket storage

This post appeared originally in our sysadvent series and has been moved here following the discontinuation of the sysadvent microsite

As mentioned in a previous blog entry, this site is deployed to an S3 website bucket when the Git master branch receives a push. I will here explain how we created and configured the website bucket in question, as well as explain the varnish configuration in front of it.

The S3 storage we use is Ceph with a S3-compatible Ceph Object Gateway (radosgw) interface, but the process should work for any S3 compatible storage with website-bucket functionality.

In this post the “variables” $s3host and $s3web are used. These refer to the host name of the S3 API host_base (e.g. “data.example.com”) and the website endpoint domain name (e.g. “data-website.example.com”) respectively.

Creating the website bucket

Creating a website bucket is quite easy using the s3cmd tool. First make the regular bucket, and configure it as a website bucket:

s3cmd mb s3://rl_web.sysadvent.prod
s3cmd ws-create --ws-error=/sysadvent/404.html  s3://rl_web.sysadvent.prod

We didn’t need to specify the indexfile, since the default (index.html) gives us what we need.

The result is a functioning website at https://rl\_web.sysadvent.prod.$s3web (replace $s3web with the bucket-website frontend), with any content in the bucket.

Uploading to the bucket

The previous gitlab-ci post listed the command for uploading content to the bucket (replace $s3host with the S3 API host name):

s3cmd --no-mime-magic --access_key=$ACCESS_KEY --secret_key=$SECRET_KEY \
  --acl-public --delete-removed --delete-after --no-ssl --host=$s3host    \
  --host-bucket="%(bucket)s.$s3host"                                      \
sync sysadvent s3://rl_web.sysadvent.prod

The given “keys” variables are used for accessing the bucket. We found we had to add --no-mime-magic, to turn off magic based mime file-type detecting (as opposed to filename suffix based) which caused the CSS files to become text/plain instead of text/css. The --delete-* options ensure that we don’t end up with cruft in the website bucket when we delete files in the repositories.

Presentation

At this point we have a fully functional website in a bucket, but we want the site to be a sub-directory of our main site. The main site already has a varnish frontend for caching purposes, so we want to use this frontend also for the sysadvent site. This implies that we want to define the bucket-website as a new backend in our Varnish configuration.

The vcl_recv is pretty self explanatory if you’re familiar with varnish:

sub vcl_recv {
    if (req.url ~ "^/sysadvent($|/)") {
        # bucket-website requires trailing / to get index.html
        if (req.url == "/sysadvent") {
            set req.url = "/sysadvent/";
        }

        # required to get to the correct bucket-website
        # (replace $s3web with your bucket-website frontend)
        set req.http.host = "rl_web.sysadvent.prod.$s3web";

        # sysadvent is a normal hash director with all the
        # rl_web.sysadvent.prod.$s3web frontends as backends
        set req.backend_hint = sysadvent.backend(req.http.X-Forwarded-For);
    }
}

We had a problem with requests for non-existing URLs returning 403. This is probably an issue in the software we use (Ceph/RadosGW) or our configuration of it, but in any case this easy to work around with Varnish:

sub vcl_backend_response {
    if (bereq.url ~ "^/sysadvent/") {

        # Backend returns 403 on missing file. Point it to the custom 404 page
        # instad.
        if(beresp.status == 403) {
            set bereq.url = "/sysadvent/404.html";
            return(retry);
        }

        # 404 is returned as a 200, because of the above config. Fix it.
        if(bereq.url == "/sysadvent/404.html") {
            set beresp.status = 404;
        }

        # Remove S3 headers
        unset beresp.http.X-Amz-Meta-S3cmd-Attrs;
        unset beresp.http.X-Amz-Request-Id;
    }

    # We set a lot of TTLs on a global scope (not just for # /sysadvent), e.g.
    if (!(beresp.http.Cache-Control) || beresp.http.Cache-Control !~ "max-age") {
        set beresp.ttl = 15m;
    }
}

The bucket-website does not return any Cache-Control header, so we add our own. It does however add an Etag, which is quite nice.

Result

You’re looking at the result of the above process – this blog.

Serving a static website from bucket storage

December 12, 2016

Creating the website bucket

Uploading to the bucket

Presentation

Result

Jimmy Olsen

Time tracking systems - software

Time tracking systems - general thoughts

Creating and using a script to Install Arch linux through wifi