Smarter Website Images

We developed an in-house service called Delivr to improve our websites, ensuring their images are delivered quickly and highly optimised for users and SEO. This article, for anyone who wants to nerd-out on the details, takes a deep dive into exactly what Delivr is and how it works.

The Need for Better Image Management

As the web and our devices have evolved, website images have got larger and improved CMS editing tools have led to the expectation that it’s best to upload high-resolution images for them to be cropped and edited online. Compounding this is the fact that, as responsive images have become a standard feature of website development, each image uploaded to a website can actually result in many derivative copies being produced in the background for use at various screen resolutions and pixel densities.

Managing many of websites on multiple CMS’ means we see some inconsistency with how responsive images are handled. Some platforms eagerly generate variations of each image at different sizes at soon as it’s uploaded, while others wait until a specific variation is first requested before generating it. Depending on the platform’s behaviour (and how image-heavy the website is) this can be a large resource drain on the web server.

There are practical issues when managing lots of websites too because each platform needs special server configuration to optimise compression settings and add support the preferable WebP file format. This is something that needs to be replicated for each website and if we find future areas for optimisation it’s impractical to update every old website so usually only new ones can benefit.

We could tick along doing things the old way, and most website owners would never know the downside, but we are always looking for win-win opportunities that give clients a better experience and streamline our own internal processes.

Our Solution

It was always critical for us to hit a few key requirements:

  • Cross platform
    a single solution to would work across WordPress, Drupal and Laravel websites.
  • Cost effective
    something we can offer a cost benefit across the board, not an expensive cloud service with a cost that we have to pass on.
  • Secure
    ensure only our own websites can make use of the solution so others cannot take advantage of it.
  • Reversible
    when a website leaves us they should be able to easily revet back to their CMS’ default image behaviour.

We quickly came to the conclusion that 3rd party services weren’t appropriate. Their pricing models are usually tied to the number of images processed and would be impossible for us to predict costs, and those costs would only escalate as we increase usage.

We already manage a lot of servers so running a service ourselves doesn’t phase us, and in fact as system engineers we appreciate the insight of seeing what’s happening inside a service like this so we can adapt and optimise it to our preferences. We already had some experience with a piece of software called Thumbor for on-demand image cropping & resizing and we knew how performant that is so there was little hesitation in picking it up for this project.

This is where we came to the idea of Delivr; an image delivery web app that we can maintain and host ourselves.

Developing Delivr

The Role of Thumbor

Thumbor describes itself as a “smart imaging service that enables on-demand crop, resizing and flipping of images”. It’s functionality that we didn’t need to reinvent because Thumbor is tried and tested by some huge websites, and has already been use for some projects here.

Essentially rather than a website resizing images for itself, it can provide web browsers with a Thumbor URL that specifies the original image location and how to manipulate the image (e.g. resizing it), then Thumbor serves the resulting image directly to the website visitor.

Thumbor compressing images

Thumbor can do a lot more that simple resizing and cropping though. It has many different filters that can be applied, and even an impressive subject detection that automatically crops to the important areas of a photo (as opposed to the centre). Through some additional tools it even compresses videos and convert very inefficient GIF animations to animated WebP images.

Crucially too, it takes on the task of optimising the compression of images and even automatically serves up a WebP image to browsers that support them. All this behaviour is configurable by us so we can tune it our preference quality vs. file size, and if that changes over time all websites using it will automatically benefit immediately.

Security Protections

It’s worth noting that this image service is only intended for images that are already publicly available on websites. In fact images must be publicly available for Thumbor to read them, so there is no need for it to offer any access restrictions to the images that it serves.

There is one very important security matter though; unauthorised parties should not be able to use Delivr because this is a benefit for our clients, not a free service for all. Similarly users/bots must not be able to request arbitrary image conversions from websites because an image could be endlessly re-requested with varying transformation settings to overburden the servers with too much work.

Thumbor does have a HMAC (Hash-based Message Authentication Code) method to prevent URL tampering but it only supports a single security key which we’d need to include in every website’s settings. It would be impossible to keep a single shared key from leaking and in the case of abuse there would be no recourse because the key could not be rotated on so many websites.

This is where our own magic is added; we run an NGINX proxy (actually OpenResty for their better Lua support) in front of Thumbor which uses a custom Lua script to validate the request’s HMAC using a website-specific secret key as opposed to Thumbor’s one. Only if that HMAC validation passes will it forward the request to Thumbor itself. Our Lua algorithm mirrors Thumbor’s so any CMS functionality built for Thumbor standards will work for our implementation without any changes.

We did come across an NGINX issue whereby it double-encoded special characters in URLs, which is acknowledged in their issue tracker but looks like it won’t be changed. As a workaround we have programmed our Lua script to Base64 encode the source image URL (to remove special characters) and then we wrote a custom Thumbor image loader in Python that knows how to get images from the Base64 encoded URL rather than a regular plain text one.

The Lua script accesses a Redis server that contains a map of all trusted hostnames and their associated HMAC security key(s), meaning a security key will not function if used by an unexpected source. If a key is ever leaked and abused on a particular website we can rotate it and only that one website will need a config update to get the new key. Expired keys are given a 24 hour grace period before they’re made inactive so there’s a window of time to get new keys rolled out.

Caching

Once an image has been requested one time, we are caching it to speed up delivery of subsequent requests.

There are two levels of caching and cached storage is shared by all Thumbor processes in the network so each load balanced node isn’t duplicating cached content for itself. The cloud provider we use doesn’t support mounting block storage onto multiple VPS’ so instead we mount it on the proxy server and share it to all other nodes using NFS mounts.

The two levels of caching are:

  1. Source images
    The original images that were fetched from the website for processing are stored for a short while so subsequent requests for different variations already have a copy the source image to hand, improving response time and saving network traffic.
  2. Converted images
    Every converted image is stored for some time so future requests do not need to reprocess it. These cached images are handled instantly by the NGINX proxy itself so they do not need to be handed off to Thumbor or even have their HMAC checked, making cached URLs extremely efficient. If the storage allocated to caching ever fills, the oldest images are automatically removed so the storage cannot run out of space.

Of course, all image responses include optimised HTTP headers telling browsers to also keep a local cached copy of the request so there’s no need for subsequent page loads to request the same URL again.

Our Management Tools

Hidden from view is a management tool for our team to add and remove websites that use this service, as well as maintain their HMAC secret keys. When such details change it automatically updates the domain/key map in the Redis database that the proxy refers to for authenticating URLs.

The management tool also responsible for maintaining the Let’s Encrypt SSL certificates that the proxy uses and it communicates with our Cloud provider’s APIs so we can deploy new Thumbor servers on demand if needed. The setup actually makes it more efficient (and safer) to spin up fresh cloud servers and then delete old ones rather than handling major updates on running servers.

This aspect of Delivr is Laravel app using the Filament framework. Team access is configured through a Google Workspace SAML integration so we can guarantee that only appropriate company members can do management tasks. It runs on its own lightweight VPS and the wider system is designed to continue running even if this is down for maintenance. Technically it’s only needed when add/removing HMAC secret keys or renewing SSL certificates; a design feature to ensure it didn’t become a single point of failure to delivering images.

Final Architecture

Delivr comprises an NGINX proxy server (fronted by a floating IP), the admin Laravel app, and load balanced Thumbor servers. All traffic enters the proxy and is distributed as appropriate. There is also block storage that it shared between all servers.

The individual servers can be rebuilt rapidly using our company’s Ansible provisioning service, or restored from daily backups. In actual fact a deliberate aspect of this setup is that nothing apart from the small database of websites’ HMAC security keys is critical and it’s actually even faster to spin up new servers and reapply just that backed up database than to restore and then verify full backups.

The Result

Website Integrations

The beauty of this system is that, as far as websites are concerned, they could be working with a standard Thumbor server so where that functionality is already available there’s no extra work to do. In our agency we’re really focused on interoperability with to following systems though:

WordPress – We have our own plugin that disables WordPress’ default image resizing and rewrites images to use a Thumbor server instead. This plugin is open source in our GitHub.

Drupal – We use the Thumbor Effects contrib module on an extremely image-heavy website. The module works great and offloading images from the web server had made a massive difference to website performance.

Laravel – Unlike our other two platforms this one is a special case because each website and web app is unique. Typically we create a Blade component that’s thenused to render all images; it will convert a given image and transform settings to a Thumbor URL before output some predetermined HTML markup.

Perfomance Comparisons

We are seeing images consistently and significantly smaller in size at equal/higher visual quality. This is even more so when PNGs are used inappropriately and animated GIFs are used; in those cases because WebP proves to be such a better image format. Here are some screenshots showing the effect on this very website with the Thumbor plugin enabled & disabled:

The savings on web server resources are harder to quantify but, anecdotally, we are seeing fewer CPU & memory spikes on servers when content teams are working on new content. True, those images are still being processed by a Thumbor server, but that tends to do the dedicated images tasks so much faster and more efficiently than a PHP process handling the same operations. And at time when a website is especially busy, the web server only needs to handle requests for the actual page and JS/CSS assets because the many image requests have been sent some somewhere else entirely.

The Future

This is a service that requires absolute reliability over constant tinkering, so it’s something we tend to leave to run peacefully for as long as it’s dealing with all that we thrown at it. Of course as component software elements receive their own updates we evaluate their new benefits and apply updates are we see fit.

We’ve deliberately not used the term CDN above because at the moment Delivr is not on a geographically distributed network. The existing work that’s been done on distributing the resource load does lends itself to that end goal though so it may come in the future if we see a need for it.

Lastly, we are aiming to open source the work we’ve done on this system so hopefully that will follow before long. While it’s a pretty niche use case, and too technical for anyone to simply run it out of the box, we’d like to link others would find some value in at and help to make to even better.