Why Do I need an XML Sitemap?

You might be asking yourself why do I need a sitemap? Well, a sitemap is like a table of contents for your website. It basically provides search engines a road map of your web pages tells them about the organization of your site content. You can also include valuable metadata associated with the pages listed in the sitemap. A sitemap is also a way we can alert search engines to new or changed content quickly.
With that out of the way let’s get started.

The Purpose of a Nested XML Sitemap

Quickly identify pages within an XML sitemap that are not currently being indexed and break up large XML sitemaps into their respectful groups.
Often times working with e-commerce sites products get added and dropped frequently. Landing pages get added, and ….hopefully, blogs get written. It can be difficult to look into webmaster tools and see you have 2,400 URL’s submitted but only 1200 indexed. It becomes mind numbing combing through them to determine what is and is not indexed. This does no one any good.
This provides no indication as to why it might not be indexed. Have you used too much of your crawl budget? Did your developer leave ’meta no index tags’ on the site? With a nested XML sitemap we can get a much granular view into the indexation health of our site.

Preface to Creating Your Nested XML Sitemap

This guide assumes you already have Screaming Frog downloaded and that you have a basic understanding of the tool. Be sure to download the newest version too if you haven’t already. The tools you will need are:

Regex tester
Screaming frog 
Webmaster tools 
Notepad ++

Phase 1

First, power up screaming frog and pop your URL into the search bar. If it is a new site I like to crawl it without any excludes or includes present.
Once you do your first initial crawl look for common groups of folders and categories you can segment. 
As you are crawling make note of any odd parameters coming through. You can open up notepad and make note of it to exclude those URL’s moving forward. If you work on an e-commerce site you want to identify product level URL’s and their URL taxonomy. As you can see once you begin the crawl you should quickly be able to identify categories fairly quickly. 

The end goal here is to identify categories and segments you want to group and segment moving forward.

screaming frog crawl

As you can see from the crawl above once I have done my initial crawl it becomes fairly easy to identify common groups. I can explore these further and see if it segmenting them is justified.

On this particular example of Colfax furniture’s site I identified both category and product level folders I want to segment. I will ultimately create sitemaps out of them. The folder structure look like this; 

/products/ 
Product level URL’s  
/product-type/ 
Category level URL’s

I have also identified odd parameters, 301’s and other anomalies I want to exclude from my nested sitemap as well. 

Phase 2

Next, once you identified the folder’s you want to segment it is time to add the proper excludes and includes to get a more optimal crawl for your XML sitemap. You can use a free regex tester to debug your regex for screaming frog.  Screaming Frog has some good documentation on excludes and includes.

regex tester

As an example, I am using the regex tester to exclude all URL’s that fall under /folder/. This tool allows me to quickly test a variety of combinations without starting and stopping crawls in Screaming Frog to see if my regex is working. You can access your current excludes and includes set up in Screaming Frog by clicking configuration. Under the drop down menu you will see ‘exclude’ and ‘include’.

Crash Course in Screaming Frog Regex Syntax   

In case you are not familiar with Screaming Frog, you can exclude and include certain parameters and folders using regex.
Let’s say you want to exclude a certain folder from being crawled it would look like:

“http://www.example.com/folder.*

This will exclude all URL’s that are within that particular folder. This comes in handy when you want to segment certain groups of URL’s.
Let’s say you found some parameters when crawling your site you don’t want to include and they look like:

“http://www.example.com/folder/?=example”, http://www.example.com/folder/?=example-4.

A quick solution is to use regex to exclude all URL’s containing a common syntax like this:

.*?=example.*

This syntax will exclude all URL’s that contain this particular character string. This comes in handy when you are trying to avoid including paginated URL’s and other odd parameters your CMS may generate. Again, if you are unsure play around with the regex tester a bit.

This is just a basic crash course in basic syntax to get you started. For a more complex overview of Screaming Frog visit Seer Interactive’s guide to Screaming Frog. 

Continuing on with our example I have identified folders I want to segment and build a dedicated sitemap for. Here is the sample include regex for each section:

“http://www.colfaxfurniture.com/product-type.*”
“http://www.colfaxfurniture.com/products.*”

I can also circle back and exclude each of these sections to build a sitemap with the remaining URL’s as well.

You also want to exclude images, javascript, CSS etc while building your nested sitemap. You can always come back and build a dedicated image based sitemap as well. You can do this by clicking ‘configuration’ and ‘spider’. Once there you can uncheck the option to include images, CSS, JavaScript etc within your crawl.

image excludes

Phase 3 

Once you have accounted for all parameters and identified your folders it’s time to run the crawl and save your clean crawl. You can either save the crawl in Screaming Frog or go ahead and export it as an XML sitemap.

sitemap file

Now, once you have a clean crawl from your first set of URL’s you want to lump together in one XML sitemap you can save it. For product level sitemaps I tend to save it as “p-sitemap”, then category level URL’s as “c-sitemap” respectfully. You will later reference these in your master XML sitemap.

Next, repeat that process for the remainder of the categories you have identified for your nested XML sitemap.  

***As a side note – you want to make sure you are not polluting your XML sitemap with junk URL’s. If you are unsure of what to include brush up on what to include and avoid in your XML sitemap.

Once you have all your separate XML sitemaps crawled and generated it’s time to piece them together. You can do this by using Notepad ++ and referencing your other sitemaps. Just follow the basic framework here: 

xml_sitemap_format

The end result should be essentially an XML sitemap referencing multiple sitemaps within it. Hence the term ‘nested XML sitemap’.

Conclusion

Not only are nested sitemaps pretty neat they help you cut down on spending time crawling and generating massive sitemaps every month. They also allow for greater granularity when troubleshooting indexation problems. By nesting your sitemap you can quickly pinpoint trouble spots and make adjustments accordingly.

If you are having trouble creating your own nested xml sitemap shoot me an email. Let’s troubleshoot the issue!

 

Posted in SEO

Leave a Reply

Your email address will not be published. Required fields are marked *