How publishing a blog post turned into a 2 hour debugging sesssion

10 min readPublished November 27, 2019Updated May 02, 2022

I recently wrote a post about how to be a faster debugger. I’m confident in my debugging skills and can move through my debugging steps quickly. However, this past weekend I tried to publish a blog post and embarked on a 2 and a half hour journey troubleshooting my failing CI build and digging through multiple node packages’ documentation.

What should have been a single button click ate my whole morning.

My goal with this post is to show that it doesn’t matter how long you’ve been coding, small things can still end up taking forever to fix. Along the way, I hope to share some lessons learned as well as some guideposts to look for during your own debugging sessions.

Context

Before I explain what was happening and the steps I took to try to fix it, I need to provide a little bit of context about the tech stack and tools I’m using on this project.

My personal website is a learning playground for me. If you want a personal website that is quick and easy to make changes in, I do not recommend this setup 🙂

My website is built in Vue (Vue 2 at the moment but Vue 3 should be released soon) leveraging Vuex for my state store. I’m using a headless CMS, Contentful, to store my content and then fetch the content using Contentful's Javascript API. I’m using Travis for my continuous integration and Heroku to host the Vue app.

Single-page applications (SPAs) notoriously present search engine optimization (SEO) challenges. This is due in part to the fact that the meta content in the HTML is not updated for each route and load times can affect whether or not a search crawler can successfully crawl the site.

To combat this issue and make sure my blog posts rank well in Google, I’ve customized my deployment process to prerender most of the routes in my site to static HTML which gets served initially when search crawlers or users land on my site. The Vue app then mounts over the top of that and takes over.

Deployment process

When I publish a blog post in my CMS, I use a webhook to trigger a build in Travis. After CI passes, I have a build step to prerender the top level routes in my site as well as all of the pages under /blog and /speaking. I use the pre-render-spa-plugin package to do this. Pre-render-spa-plugin is built on top of the prerenderer and uses Puppeteer to render the website content in a headless browser.

The problem

Over the weekend, I published a blog post and noticed that the build triggered by the CMS webhook failed. My builds had been failing since November 18 but I hadn’t made any changes to the application code since November 3 (which did deploy successfully).

When I looked at the build logs, the error I kept seeing was

error
[prerender-spa-plugin] Unable to prerender all routes!

That’s not super helpful. I wanted to make sure that all of my recent blog posts were SEO friendly so I started troubleshooting.

Troubleshooting process

1. Verify the build locally

My first step to identify the cause of the issue was to try to run the build locally. Normally, I only lint and unit test my code before I push it up but I ran a quick production build to see if I could reproduce the build failure. Everything ran locally, the site prerendered, and the build was successful.

2. Add console.logs

Because I didn’t have very useful error messages to give me any clues, I added some console.log statements to the prerender build step to see if I could isolate where in the code the error was occurring. This wouldn’t give me any additional information until I ran a build on Heroku but gathering more information is always important.

3. Upgrade packages

The additional console.logs wouldn’t change any of the code’s behavior so I decided to run a quick yarn upgrade to see if that helped anything during the next build. Yarn upgrade updates all the node packages it can find within the constraints of the semantic version rules you’ve included in your package.json.

If your semantic version rules in your package.json are conservative, upgrading packages might be a relatively safe option to try. I ran a quick smoketest locally to see if the upgrade broke any functionality on my site and all looked good.

As a side note, I do not recommend jumping right to updating all of your node packages for production applications. Because this is my personal website, it’s not a big deal if I break something while upgrading. I can roll it back easily or it can just be broken for a little bit.

When the next build ran, all of the node packages had been updated and the prerender build step had some additional logging. The build failed again with the same unhelpful error. And all of my console.logs were printed so I didn’t have any more information about where in the build step the error occurred.

4. Upgrade Node

While I was upgrading packages, I thought I might as well upgrade Node to see if that changes anything. Again, I don’t recommend jumping to this in production applications but my personal website is a learning playground for me so what the heck ¯_(ツ)_/¯

Upgrading Node didn’t change anything. When the next build ran, I saw the same error.

5. Check versions

Next, I checked the versions of all of the base infrastructure in different environments. Without any useful error messages to give me clues, I was trying a trial and error approach at this point. I wanted to check if there was a version mismatch somewhere between my local environment and the build environment that could be the culprit.

I did find some inconsistencies between what version of Node I was running locally and what version was being used in Travis and Heroku. I also noticed an inconsistency in which package manager was being used where.

I was using yarn locally but I had accidentally configured Travis to use npm. Since the build wasn’t failing in CI but while building on Heroku, this probably wasn’t the cause of the issue, but I like consistency so I reconfigured Travis to use yarn.

As expected, this didn’t fix anything.

6. Check the diff of build logs for clues

I was determined to try to find a clue for what had changed between my last successful build and all of my current failing builds. I copied and pasted the log from the successful build and the log from the next failed build and put them into a free online diff checker. I used this one, but there are a bunch out there.

There was nothing obvious that changed between these two builds. The only difference was in the timestamps in the logs. So I still didn’t have any clues.

7. Remove and re-add Heroku build packs

As I was combing through my Heroku setup to try to come up with an idea, I remembered that I have 2 custom buildpacks installed for prerendering. One for Puppeteer and one for the headless browser it uses. Since my build step uses Puppeteer with the pre-render-spa-plugin, this seemed like another avenue to try.

Thinking that maybe there were updated versions of the buildpacks or somehow something got out of whack, I removed and re-added both buildpacks. Then I manually started another build.

Unfortunately, nothing changed and the same unhelpful error was output.

8. Investigating Github issues

At this point, I was frustrated by the unhelpful error message. Without any information, I was stumbling around in the dark. I started to look through the open issues on the pre-render-spa-plugin Github repo. There are multiple open issues with similar complaints about failing pretenders but without any information about what is causing the failure.

I even stumbled upon this open pull request (PR) that someone opened to print more helpful error messages.

In that PR, there was a comment with a code snippet (🙌) about how one person made a custom prebuild step to workaround the unhelpful error message. In the prebuild step, they monkey-patched the pre-render-spa-plugin node plugin with a string replace on the console.error call, forcing the error to be output to the local build console.

This proved to be the tipping point in my troubleshooting journey! First, I learned that you can have pre- and post- scripts for all of your common scripts in package.json. I also did some Googling and learned how to invoke some custom Node code in a package.json script. I had always used CLI scripts, never my own, so this is pretty cool!

If you’ve also never done this before, this blog post was helpful for me, particularly when I saw node ./scripts/env-check.js in their build script in the package.json.

9. Customizing pre-render-spa-plugin error message

I added the error message monkey-patch code to my prebuild script and ran another build. I received a helpful error message!! 😭

The error returned was

[PRERENDER ERROR] TimeoutError: Navigation Timeout Exceeded: 30000ms exceeded

I had more information for me to use in my troubleshooting!

10. Decreasing time after render

Now that I knew that the max default timeout was 30 seconds, I thought I could find a way to speed up the rendering of all of my routes. Sure enough, I had my 'render after' time for each route set to 5 seconds. I did this to give the Vue app enough time for the code to run in the headless browser session and receive a response from the CMS on each route.

Even with concurrent rendering, I can see how easily that could exceed 30 seconds. I decreased the 'render after' time to 1 second.

When I pushed that up and ran the next build, the rendering still exceeded 30 seconds and the build failed.

11. More Github issue digging

I jumped back into Github to search for information about this new error message and to see if there was a way to increase the allowed timeout. I landed in this issue in the pre-render-spa-plugin repo.

In this issue, multiple people are talking about the timeout errors that they are seeing and trying to work around them. When using this plugin, all the routes are rendered concurrently. In a CI environment with a slow network connection, spinning up all of those headless browser sessions can be slow which leads to the timeout. This helped explain why my build runs locally but is failing on Heroku.

Capping the max number of concurrent routes to render probably wouldn’t work for me as I had about 70 to render and even if I only rendered 10 at a time, I’d probably still max out the 30 seconds.

In that issue, there is a comment from someone who noticed that increasing the timeout in the plugin configuration did not resolve this timeout error. After some investigation, they discovered that there were actually 2 timeouts at play. The one in the configuration controlled the timeout for how long it takes to launch Puppeteer (which starts and connects to the headless browser) and a different one for how long to wait for each page load. Prerenderer (the package that pre-render-spa-plugin uses) didn’t have a configuration option for this second timeout value. The person who commented opened this PR to add that configuration option.

That PR pointed me to an old version of the Puppeteer README which mentioned the page.goto() method. Looking for that method in the current Puppeteer documentation led me to the API docs. Sure enough, Puppeteer has options for navigation timeout.

With the PR from above, the PuppeteerRenderer from the prerenderer package can now configure navigationOptions which it passes to the page.goto method for Puppeteer.

12. Final build step updates

Now that I learned how to increase the timeout for each page load, I updated the configuration for the PuppeteerRenderer in my prerender build step.

When the next build ran, I saw nothing but green! 😍 2.5 hours after publishing my blog post and I had a passing build again!

Lessons learned

If I could do this all over again, I would’ve skipped all of my trial and error troubleshooting and jumped right to step 8 to find ways to gather more information. Debugging and troubleshooting without the ability to make informed decisions wastes a lot of time. I wasted a lot of time upgrading and poking around without information.

Gather as much information as possible

I feel like I relearn this lesson all the time, but look at Github issues! Other users of the package you’re working with often have tips and workarounds for the same things that you’re getting stuck on. Even if someone hasn’t fixed your exact problem, the conversations in issues often spark new ideas for things to try.

Github issues have valuable info when you’re stuck

I also learned about pre- and post- steps in your build scripts in package.json. I didn’t know that there was more to the script lifecycle than lint, test, build, and start. This added level of flexibility is awesome!

Pre- and post- steps exist and are super flexible

Having never written custom code to be executed in a build script, I didn’t know how to invoke it from package.json. This also adds a ton of flexibility to how I build and deploy my code! The more you know!

node ./path.to/custom-script.js will invoke a custom script in a package.json script

Final thoughts

Even though I was frustrated by a failing build, out of seemingly nowhere, this was a fun thought exercise when I reflect on it. No matter how experienced I get writing code, little things like timeouts can grind your workflow to a screeching halt and prove very challenging to resolve. I appreciate that problems like this still humble me when they happen, and I learn a ton from them.

On the bright side, I’m becoming a prolific enough blogger that I may need to consider a new architecture for my blog posts and build process 😬🙃

Find related posts:Developer workflow