16th January 2018
We take a closer look at what crawl budget is, how it can affect your website, and what you can do to ensure you’re giving your content the best chance of ranking by making best use of your available crawl budget.
The below insights and fixes are perhaps not for beginners. But for larger sites that have nailed the basics, fixing crawl budget can reap large rewards.
What is crawl budget?
Google (or any search engine for that matter) do not have a strict definition for what crawl budget is, however earlier this year Google did provide some clarification on their webmaster blog. It’s recommend reading but put in layman terms, crawl budget is a combination of your crawl rate limit (designed to stop Google crawling too fast or too slow) and crawl demand (how much Google wants to crawl your site, based on how popular your pages are and how stale they are).
In effect then, crawl budget can be described as:
The number of URLs Googlebot can and wants to crawl
So now we’ve established what crawl budget is, let’s get down to the fun stuff and see how we can find issues with crawl budget and what we can do to fix them!
Problems caused by insufficient crawl budget
Everyone talks about content being king, but for many large sites, a large amount of content can spell the start of problems that actually result in your content causing you to miss out on important rankings, bringing your whole site visibility down in the process.
An insufficient crawl budget can mean that important areas of your site aren’t crawled, or it takes a long time for pages and changes to be crawled, resulting in Google never seeing your site’s full potential, or understanding the importance of key pages.
If you have 500,000 pages on your site and Google only crawls 5,000 pages a day (including re-crawls of pages like the homepage) it could take weeks for them to reach certain pages.
However, the good news is that there are steps you can take to quickly establish if you’re being affected by crawl budget issues, and a few different ways to improve your budget efficiency.
How to diagnose crawl budget issues
You might already know how many pages your site has, but if you don’t, your best bet will be to look within the XML sitemap and take note of the number of URLs.
Once you have an idea of the number of URLs your site has, you can use Search Console to establish crawl budget issues with the following steps:
- Head to Search Console and navigate to Crawl > Crawl stats using the left hand navigation
- Divide the number of URLs you know your site has, by the number under ‘Average crawled per day’
The resulting number will give you an idea of how many more pages you have than what Google crawls per day e.g. if you have 10,000 pages, and your average is 500 you’ll end up with 20; meaning you have 20x more pages than what is crawled each day, and that is a problem!
Anything below 3-5 and you’re probably OK and should focus efforts elsewhere to improve your site performance. If it’s above; read on..
How to optimise your crawl budget
If you’ve performed the above check and ended up with a double digits number, it’s likely that Google isn’t seeing all your content, or taking a long time to see changes and optimisations and you should look to improve crawl budget to remedy this, with the following actions helping to improve things:
Reduce errors – All crawlable URLs should be pages that are indexable. The only response code to a crawlable URL is HTTP 200. You can use server logs to see how often pages returning other errors are accessed by Google crawlbot and fix those with the most hits. You can also crawl your own site with a tool like Deepcrawl (paid) or Screaming Frog (free on crawls up to 500 URLs).
Eliminate internal links that result in redirects – Ideally, your website should have no 301 or 302 that are crawlable by crawlers. If you present Googlebot with a URL to crawl that results in a 301 that then leads to a valid 200 URL, then Googlebot will have to access two URLs instead of just one. This is inefficient use of crawl budget. Reducing redirects on the site will allow Googlebot to spend more time on valid, indexable URLs. Note: Redirect chains should be strictly avoided, as this can lead to a big waste of crawl budget.
Prevent crawlers accessing certain parts of your site or certain URLs – If you have a large site, it’s possible that there are whole areas that crawlers don’t need to access. An example of this could be a booking page on a website that has lots of potential URL parameters – or perhaps an entire web app that exists in a directory. After you’ve decided which areas of your site need to be crawled, and which don’t, you can take one of the following actions:
- Add specific URLs or directories to your robots.txt, telling bots not to crawl these URLs
- Add nofollow directives to links which you don’t want crawlers to follow
Avoid pages with low quality or duplicate content – If search engine crawlers find pages that are duplicated or low quality (i.e. either poor quality content or very short), then the search engine might decide not to index those pages. Either remove these pages or prevent bots from accessing them to ensure that search engine crawlers do not waste time on pages that are not important to users.
Crawl budget issues will only affect a handful of sites, and if your site has fewer than a few thousand URLs it’s unlikely to be a problem. If the above post has rings true with you, then get in touch and talk to us about how we can help improve your crawl budget and SEO in 2018.