Tutorials

What is the crawl budget

Table of contents:

Anonim

A term that is mentioned a lot today in the SEO community is crawl budget. If we translate it, it would read as “tracking budget”. It is a rare middle ground, but this is a very old concept in the SEO area.

Those who work with large-scale projects such as large e-commerce, content portals, and SEO specialists, understand crawl budget as the time that Google spends reading the pages of your website on a given day.

Index of contents

It is the time that the Google crawler takes to read the pages of a website. But this time the crawler spends on your website depends on several factors; such as website authority, percentage of duplicate content, page errors, and many more.

However, according to Google's official webmaster blog, it is stated that not everyone should be concerned about this issue of crawl budget. That is, if they have a website with a few dozen pages, there is no need to worry about this issue of page crawling, as Google will do it without a hitch.

But if you have an online store or any other web project with a few thousand pages, you will have to pay close attention and optimize the crawl budget regarding your website.

Crawl budget and web positioning

From Google they affirm that the crawl budget does not influence the positioning, but that nevertheless it can influence, and sometimes control, negatively other of the more than 200 factors to rank in the search engine.

But why do we want Google to crawl the pages of our website more times? On the other hand, we find several SEO experts who ensure that having a good crawl budget will improve the overall positioning of the website's pages in the ranking and thus increase organic traffic.

Basically, Google has a certain time to spend within your site, since it has to decide how much time it is going to spend on each of the sites around the world, for which it will have to calculate how many simultaneous connections it will be able to make to be able to read the pages of your website.

The quality of the website

Google spends time to be able to connect on the website, read these pages and stop this reading. Repeat this throughout the day, but there is always a fraction of the time. That fraction of time, is usually proportional to the authority of your website, the number of new pages, and the relevance it has against Google.

This is given by the quality of your content and the links that point to the site, that is, if you have many quality links pointing, it may be that Google understands you with more quality and spends more time on your website, as long as there is a higher volume of pages.

In general, the crawl budget doesn't change much for a 10, 50 or 100 page site, so in a few pages there is not much difference. But for large sites, if Google has a second to go through your site and you tell it what to read, that will be very useful for the crawler, completing their crawling task more quickly.

Set which are the important pages

First, you have to map out a more organized architecture of site information, establishing which pages are unnecessary, and not letting certain pages be indexed by checking the robots.txt file.

Google should not spend time in the search section of the website or even in the section where there is filter navigation, for example, as in an online store, where you can choose the shoe size, the size of the apartment or the shirt color. These filters are what people normally call “faced navigation” or “navigation filters”.

Some webmasters tend to block these filters and those searches in the robots.txt file, just so that Google does not spend time reading these pages, because, in fact, they are focused on the user who is looking for that experience, and they are content that is already available on other internal pages of the site.

We recommend reading: Errors to avoid when creating a website

Another line is that by establishing which are the important pages of your site, you save Google's time on pages that have duplicate content, such as the case of faced navigation, the privacy policy page, terms and conditions, and not you want them to be read. These pages will only be available to users who want to see these pages.

Time should not be wasted on these low-value pages, since you don't want to rank for them and they don't make the least difference in your life, but they have to be there because some users want to consult this information anyway.

How the crawl budget works internally

In general, the crawl budget is based on architecture. You define the links to the pages that Google is going to be able to read and prioritize them by their level of importance.

After all, the links coming out of these pages are the ones that are likely to be prioritized by Google. So, it is worth the logic to think very well about internal linking and the way your page is structured.

The crawl budget is the time that Google spends to be able to read, understand the information on the website and evaluate elements such as the organization of the architecture and blocking in robots.txt. Using the nofollow tag on a link prevents Google from following through on that link. For example, if a link has a nofollow attribute, but another internal link doesn't have one to get to the page, then Google is going to take the second path, making you spend less time.

Benefits of an optimized site

There are things that will help you have more pages read on a daily basis, which can be useful for any website. For example, if your server is faster, Google will, in that time, request more pages.

If your page is compressed, Google will, in these requests, request more pages. And if you have a clean and adequate code, Google will also receive a more compressed page at the end of the day, with better bits. That is, the optimization of the website, the speed of the site and the server, greatly influence the issue of crawl budget.

How to calculate the crawl budget of your site

The number of times that the Google search engine spider crawls your website in a certain time allotment is what we call "crawling budget". Therefore, if Googlebot visits your site 32 times a day, we can say that Google's tracking budget is approximately 960 a month.

You can use tools like the Google Search Console and the Bing Webmaster Tools to calculate the approximate crawl budget of your website. Just log in and head to Tracking> Tracking Statistics to see the average number of tracked pages per day.

Crawl budget and SEO: are they the same?

Yes and no. While both types of optimization aim to make your page more visible and impact your SERPs, SEO places a greater emphasis on user experience, while spider optimization is entirely about attracting bots.

Search engine optimization (SEO) is more focused on the optimization process for user queries. Instead, Googlebot optimization focuses on how the Google crawler accesses your site.

How to optimize the crawl budget

There are several ways to optimize the crawl budget of any website, depending on each web project, number of pages and other issues, here are some points to consider:

Make sure your pages can be tracked

Your page is traceable if search engine spiders can find and follow links within your website, so you will have to configure the .htaccess and robots.txt files so that they do not block critical pages on your site. You may also want to provide text versions of pages that rely heavily on rich media files, such as Flash and Silverlight.

Of course, the reverse is true if you want to prevent a page from appearing in search results. However, setting the robots.txt file to “disallow” is not enough if you want to prevent a page from being indexed. According to Google, the "disallow" rule does not guarantee that a page does not appear in the results.

If external information (for example, inbound links) continues to drive traffic to the page you have rejected, Google may decide that the page is still relevant. In this case, you must manually block the indexing of the page using the noindex meta tag or the HTTP X-Robots-Tag header.

- Noindex meta tag: put this meta tag in the section of your page in order to prevent most web crawlers from indexing your page:

noindex "/>

- X-Robots-Tag - Places the following in the HTTP header response to instruct crawlers not to index a page:

X-Robots-Tag: noindex

Please note that if you use the noindex meta tag or the X-Robots-Tag, you should not disallow the page in robots.txt. The page must be crawled before the tag is seen and obeyed.

Cautious use of rich media files

There was a time when Googlebot couldn't crawl content like JavaScript, Flash, and HTML. Those times are long gone (although Googlebot still has issues with Silverlight and some other files).

However, even if Google can read most rich media files, other search engines may not be able to, which means you should use these files judiciously, and you probably want to avoid them entirely on the pages you want. position.

Avoid redirect strings

Every URL you redirect causes you to waste a bit of your crawl budget. When your website has long redirect strings, i.e. a large number of 301 and 302 redirects in a row, it is possible for spiders like Googlebot to crash before reaching the landing page, meaning that page will not be indexed. The best practice with redirects is to have as few redirects as possible on the website, and no more than two in a row.

Fix broken links

When John Mueller was asked about whether broken links affect positioning or not, he replied that it is somewhat more focused on the user experience than for positioning purposes.

This is one of the fundamental differences between SEO and Googlebot optimization, because it would mean that broken links do not play a substantial role in rankings, even though they greatly impede Googlebot's ability to index and rank a website.

With that said, you should follow Mueller's advice considering that Google's algorithm has improved substantially over the years, and anything that affects user experience is likely to affect SERPs.

Set parameters in dynamic URLs

Spiders treat dynamic URLs that lead to the same page as separate pages, which means you may be unnecessarily wasting your crawl budget. You can manage URL parameters by accessing Search Console and clicking Tracking> URL Parameters. From here, you can inform Googlebot if your CMS adds parameters to your URLs that do not change the content of a page.

Clean the sitemap

XML sitemaps help both visitors and spider robots, making content better organized and easier to find. Therefore, try to keep the sitemap up to date and purge it of any clutter that could harm the usability of your site, including 400 level pages, unnecessary redirects, non-canonical pages and blocked pages.

The easiest way to clean the sitemap is to use a tool like Website Auditor. You can use Website Auditor's XML sitemap generator to create a clean sitemap that excludes all blocked pages from indexing. Furthermore, by going to the “Site Audit” option you can locate and repair all 4xx errors, 301 and 302 redirects and non-canonical pages.

Make use of feeds

Both feeds, RSS, XML and Atom allow content to be delivered to followers when they are not browsing the site. This allows users to subscribe to their favorite sites and receive regular updates every time new content is published.

In addition to the fact that RSS feeds have long been a good way to increase readership and engagement, they are also among the most visited sites by Googlebot. When your website receives an update (for example, new products, blog posts, page updates, etc.), send it to the Google Feed Burner to make sure it's indexed correctly.

Create external links

Link building remains a hot topic, and there's no glimpse that it's going to go away anytime soon.

Cultivating relationships online, discovering new communities, building brand value; These little wins should already be printed in your link planning process. While there are distinctive elements of link building that are now so 1990, the human need to connect with others will never change.

Currently, we already have evidence that external links are closely correlated with the number of spider visits your website receives.

Maintain the integrity of internal linking

Although creating internal links doesn't play a substantial role in crawling speed, that doesn't mean it can be completely ignored. A well-maintained site structure makes your content easily discoverable by search robots without wasting your crawl budget.

A well-organized internal link structure can also improve the user experience, especially if users can reach any area of ​​your website in three clicks. Making everything more generally accessible means that visitors will stay longer, which can improve SERPs.

What conclusion do we draw?

Again, reinforcing what has already been mentioned above, this issue of crawl budget will be important for large websites with hundreds and thousands of web pages, otherwise it is not worth worrying about, since Google will track your website smoothly.

We must not complicate the crawling of the pages of our site to Google. There are many websites with enough errors, and even with barriers created by the robots.txt and sitemap.xml files that prevent Google from accessing the content. If we want to improve the positioning in the Google ranking, then we have to authorize and simplify the pages of the website so that Google can quickly access, index and position. Very simple.

By now, you've probably noticed a trend in this article: Best practices for traceability also tend to improve searchability. So if you're wondering if crawl budget optimization is important to your website, the answer is yes.

Simply put, if you make it easier for Google to discover and index your website, you'll enjoy more crawling, which means faster updates when you post new content. You'll also improve the overall user experience, improving visibility and, ultimately, ranking of SERPs.

These are just some of the points, among many others, to improve and optimize the crawl budget of a website.

Tutorials

Editor's choice

Back to top button