For larger websites, auditing the XML Sitemap can oftentimes uncover lost traffic opportunities and provide the insight needed to present a more accurate and trustworthy XML Sitemap to search engines. Taking the time to identify and fix the issues within an XML Sitemap can mean the difference in thousands of visits to your website, and potentially a notable revenue impact. It’s an easy exercise for any web-savvy person, and I’m going to show you how to conduct an XML sitemap audit in a manner that is repeatable on most any website.
The Importance of Accurate XML Sitemaps
The importance of an accurate XML sitemap has been explicitly stated by Duanne Forrester (former head of Bing Webmaster Tools) in this “Whiteboard Friday” video with Rand Fishkin (founder of Moz). Here’s what Duanne had to say:
…if you start showing me 301s in here, rel=canonicals, 404 errors, all of that, I’m going to start distrusting your sitemap and I’m just not going to bother with it anymore. If the way that you’re communicating to me that you have new content is to submit the sitemap through the functionality in Webmaster tools, instantly you’re submitting me something that I’ve learned not to trust because its cleanliness is in question. It’s very important that people take that seriously. It’s not a fire and forget.
It’s unclear whether Bing still has the same policy today, and whether Google has a similar policy, but despite this uncertainty, auditing your sitemap will provide further benefits beyond ensuring accuracy. An XML Sitemap audit can oftentimes help you identify situations such as the following, which could be holding your website back from maximum traffic and revenue potential:
- 3XX Redirected URLs (301, 302, etc.)
- 4XX Client Errors (404, 410, etc.)
- 5XX Server Errors (500, 502, etc.)
- URLs with a “noindex,follow” or “noindex,nofollow” meta robots tag
- Non-self-canonicalized URLs
Conducting an XML Sitemap Audit
I’m going to offer a step-by-step process to help you audit an XML Sitemap, whether it’s your site, a company website, or a client’s website. You will need the following tools to conduct such the audit:
- Screaming Frog (or similar crawl spider software)
- Google Sheets or Microsoft Excel
- Google Analytics
Step 1: Crawl Your XML Sitemap with Screaming Frog
Previously, you needed to export your XML file and convert it to a CSV or .txt file with a tool such as the Luxon Software XML to CSV converter in order to copy the list of URLs to crawl with Screaming Frog. While this wasn’t a lengthy process, it did provide challenges for sites with very large XML sitemaps, and it did add an extra step to the process.
That’s no longer necessary, however, as Screaming Frog can crawl these URLs on its own, according to their article:
Did you know that you didn’t need to convert your XML sitemap into a list of URLs for us to crawl it? You can simply save the XML sitemap and upload it in list mode and we will crawl the XML format natively.
First, go to your XML sitemap URL and save the page as an XML file to you computer. Then, simply switch the “Mode” in the top nav menu Screaming Frog to “List” mode, click the “Upload list” button atop Screaming Frog, and then select the “From a File…” option. Finally, change the “File Format” to “XML File,” and then select your XML file.
Proceed to crawling the URLs in the XML sitemap with Screaming Frog. If you have thousands of URLs, it will likely take some time, so go get a cup of coffee or work on another project while Screaming Frog crawls in the background.
Step 2: Export the Crawl Data
The next step is to export your crawl data. You specifically want to export the Internal (All) data. This report gives you much more data than you need. Here are the columns that you will want to retain.
- URL Address
- Status Code
- Meta Robots tag
- Canonical URL
- Redirect URL
I prefer to export the crawl data as a CSV file and then import to Google Sheets. However, for large XML Sitemaps with upwards of tens of thousands of URLs, Google Sheets can suffer from slow performance. In this case, Excel would be the preferred spreadsheet tool.
Step 3: Import the Crawl Data to Google Sheets & Prepare the Data
Import the CSV file into Google Sheets and delete all columns that you don’t need. Typically, you’ll only need the columns for the five data types listed above. Here’s what a simple example looks like:
Next, you’ll want to prepare the data for analysis. You can add an “Issue” column to make note of what needs to be fixed. Here is what that would look like using the same data as shown above.
Step 3: Export Google Analytics Data & Import into Worksheet
Next, we want to see what the traffic metrics have been to each URL (in the XML Sitemap) over a period of time. As long as the data is manageable, I like to use a year’s worth of data to analyze.
Note: It’s possible to pull data from Google Analytics as you crawl the URLs with Screaming Frog, however, it can slow down the crawl by quite a bit. Sometimes, it can be faster to simply export data from Google Analytics as a CSV file, import the data into a new tab of you spreadsheet, and use VLOOKUP formulas to sync the traffic data to each URL (by row) in your main worksheet.
Here’s what the Google Analytics traffic data can look like when pulled into the main worksheet with the other data that we’ve collected from our Screaming Frog crawl. In this example, I am reviewing Google / organic “Sessions” and “Revenue” for the past twelve months. The data has been re-sorted by “Sessions” in order to list the URLs with the most traffic (over the twelve month period) atop the worksheet.
By now, you should have plenty of data to make educated decisions how about to fix your sitemap errors. Look for opportunities such as the following:
- Restore pages that should be rendering a 200 status code, but instead are producing 3XX, 4XX or 5XX errors
- Fix improperly canonicalized URLs to improve indexation
- Fix inaccurate meta robots tags to improve indexation
- Remove URLs from the sitemap to improve accuracy
- …etc.
Bonus Step: Tally Up the Traffic Metrics from URLs with Issues
It can be helpful to see the traffic metrics of all XML Sitemap issues at a single glance, whether you’re reporting to a client, upper management or simply attempting to fix your own website issues. In order to do this, create a new tab in your spreadsheet and format it as such:
Here are the formulas for the various columns in the example shown above. Since I’m using Google Sheets, my formulas are slightly different than Excel formulas. Also note that my XML Sitemap crawl data is in an “All Sitemap URLs” worksheet.
Number of URLs
=COUNTIF('All Sitemap URLs'!$B:$B,$A2)
Total Organic Traffic Loss
=SUMIF('All Sitemap URLs'!$B:$B,$A2,'All Sitemap URLs'!G:G)
Total Organic Revenue Loss
=SUMIF('All Sitemap URLs'!$B:$B,$A2,'All Sitemap URLs'!H:H)
Taking Action on Your Data
While it’s great to now know what errors exist with your XML Sitemap, nothing will change (with traffic and revenue) without taking action. If your company has a development team, they are likely to be bogged down with plenty of company-wide issues to fix. This is where the tallied issues report is very helpful. It allows you to quantify the importance (or lack of importance) to fixing the issues in the XML sitemap.
While SEOs strive for perfection, it’s important to understand that prioritization of development efforts (based on impact to the overall site) is going to dictate how fast the issues get fixed. The more you can prove the impact to traffic, and ultimately revenue, the more you can make your case for getting the XML Sitemap issues fixed.
Good luck and please post any questions in the comments section. I’m happy to answer them!