Mess up & learn: service worker caching gotchas

profile picture

Dries Engineer

12 Apr 2018  ·  5 min read


Service workers are a great, multifunctional tool for enhancing your web app’s user experience. Among many other things, they cache static assets, giving your application offline-first capabilities and faster load times on slower networks. But there are a few gotchas – one of which nearly messed up one of our production applications...

Last year, one of our main goals for the front-end team was gathering more in-depth knowledge on progressive web apps. As a proof of concept, we integrated it in a separate branch of one of our production applications. We were all very excited when we saw the numbers in the Lighthouse audit: almost everything was at 99%. The application’s load times were lightning fast, and mobile users were even prompted to install the app to their home screen! To top it off, we’d also implemented an offline mode: instead of an ugly “You’re offline…” screen, we could show the user the login screen with a notification saying he’s offline.

Everything seemed to work beautifully, and after a few sanity checks, we pushed the changes to production.

All was well. At least, for a few weeks… When we discovered we’d made a grave mistake.

So what went wrong?

Caching files locally is one of the many great things that can be accomplished by leveraging service workers. In this case, we wanted to make full use of this, and we chose to cache as many files as possible. As it turned out, a few too many, because we also accidentally cached the index.html file we use to bust our JS bundles. We also found out we had added caching headers for one year to our service worker file.

In short: we had managed to lock ourselves out.

Since restarting the service worker seemed to be the only way for a cached file to be re-fetched, we started looking for any way to kill the damned thing… At first, things looked very dire: we had no way to remotely disable the service worker. We tried pushing a new version of our application to S3, but that didn’t make a difference – our service worker would always load the index.html it had in its caches. And since we’d given it a cache header of 1 year, there was just no way for us to kill it.

Troubleshooting: worst, bad, bingo!

One of the first solutions we came up with was to send out emails to the user, containing a link to an HTML page that could unregister the service worker. But obviously, this was the last thing we wanted to do: bothering users because we made a mistake didn’t seem like the best idea.

A second solution was to wait until the release of version 3 of the app, which was planned in a few weeks. We could then send our users an activation email linking to a page that had code to unregister the service worker. This seemed better, but still not optimal.

Looking for a better solution, we went back to the SW spec. After thorough inspection, we discovered the following trait: the browser will set the cache headers of a serviceworker to no-cache, when the age of the serviceworker is more than 86400 seconds (= 1 day).

This workaround allowed us to just update the service worker so it would self-destruct, without notifying the user. So we made a service worker that would destroy the caches and unregister itself when installed. We pushed the changes, et voila – issue solved!

If that sounds a bit too simplistic, it’s because it was. We discovered we had an ‘issue’ with our S3 bucket upload script – an open source project you can find here. Files with the same name weren’t being replaced, at least not in the case of the service worker. We later realised this had always been the case; we’d simply never realised, because our files always get a hash at the end, like “[name].[hash].js”. This way, the file names are always different and always get updated. The simplest solution was to add a whitelist for service workers, so they now always get updated on S3.

While we were still struggling with our rogue service worker, however, we just manually updated the contents of the service worker file on the S3 bucket. When this was done, we crossed our fingers and waited 1 day.

“The service worker killed itself!”

A few of our non-dev floormates may have frowned at our excitement over that sentence, but we were all more than relieved when we saw the service worker had successfully destroyed the caches and disabled itself.

The service worker code that fixed it:

// fix broken service worker in v2.0.9
const DIRTY_CACHE = 'sw-precache-v3-pre-cache-v2.0.9';

self.addEventListener('install', () => { 
  // Skip over the "waiting" lifecycle state, to ensure that our 
  // new service worker is activated immediately, even if there's 
  // another tab open controlled by our older service worker code. 
  self.skipWaiting(); 
});

self.addEventListener('activate', event => {
  self.registration.unregister();
  event.waitUntil(
    caches
      .keys()
      .then(cacheNames =>
        cacheNames.filter(cacheName => cacheName.indexOf(DIRTY_CACHE) !== -1),
      )
      .then(cachesToDelete =>
        Promise.all(
          cachesToDelete.map(cacheToDelete => caches.delete(cacheToDelete)),
        ),
      )
      .then(() => self.clients.claim()),
  );
});

Bonus: a kill switch using headers

When researching kill switches for service workers, we also discovered this handy header you can add to responses to clear out the browser cache and storage.

Clear-Site-Data: storage cache

This also makes it possible to remove a faulty service worker, as long as you remember to remove the bad service worker code. You can find more info here.

So what did we learn from this mistake?

Firstly, that years of experience won’t protect you from messing up every once in a while. Luckily, even when you think you have completely locked yourself out, there is always a way to get back in! Stay calm and explore multiple possible solutions; don’t rush into a poor solution before you’re sure it’s your only option.

The biggest learning, of course, is more of a reminder.

Never push important changes to production without going through a few days of testing first.

So yes, when shipping a service worker, you should always test it thoroughly. Test and confirm which resources are cached by the browser, and which by the service worker – and make sure those lists don’t contain anything you don’t want.

And that’s it! Service workers can be an incredibly useful tool to have in your arsenal – as long as you pay attention… Hopefully, we can prevent someone else from making the same mistake ;)