There is nothing more frustrating than working hard on a website just to find out that Google isn’t playing nice with it. It is disheartening to run into indexation issues within search engines after you have put in all the hard work to optimize and create the perfect website. So, we are going to explore why Google hates your website.

Set Up Search Console

Before we begin I am assuming you have set up Google’s search console to submit your XML sitemap and check your indexation status within their index. If you haven’t done this check out the basics of what to include in an XML sitemap and setup search console.
Assuming you have search console set up navigate to ‘Google Index’ -> and ‘Index Status’. This will give you an overview of what pages Google has discovered so far.
index status
However, if you noticed that Search Console says you have 100 URL’s submitted and 1 indexed fear not!

Before we begin, check your XML sitemap and make sure it is in the correct format. Most sitemaps are located at: “http://www.example.com/sitemap.xml“. It should look similar to
sitemap
If it looks nothing like the above image then you may just have a formatting issue and you can adjust it accordingly. However, if your sitemap is in the correct format then we must dig deeper.

Meta No Index

The first step to figuring out why Google hasn’t indexed your website is to make sure you have not accidentally left the ‘meta no index’ tag on your site. The easiest way to check this is by using screaming frog. Simply crawl your website and check under the directives in the side panel of screaming frog. You will be able to tell pretty quickly whether or not you have left the meta tag on.
The other option if you do not have a crawler at your disposal is to view the source code of a webpage. Granted this will be a manual process but navigate to your homepage, right click and press ‘view page source’. Then search for this line of code “<META NAME=”ROBOTS” CONTENT=”NOINDEX, NOFOLLOW”>”. If you don’t see if then you are in the clear sort of. You can cross reference the pages that have not been indexed by Google and check each one using this method.

Robots.txt

The next step is to determine if you have blocked Google from crawling your website in your robots.txt. It is fairly common to block all major search engines when developing a website. To see if you have anything blocked simply access your robots.txt file by typing http://www.example.com/robots.txt this will pull up your current robots.txt configuration. Here is what you want to avoid:
robots.txt

If everything looks good we can move on to the next step if not you can learn more about how a robots.txt file functions, and update it accordingly.

Are your pages correctly linked?

Next, this might go without saying but if your pages aren’t all linked together then search engines will have a difficult time accessing the page. Generally, when a web crawler or search engine hits a page it crawls all the links on that particular page and indexes it accordingly. Think of a spider crawling through its web. Therefore, if you have created a bunch of pages that are not linked or “orphaned” then Google will have no way to discover it. Fortunately, there are a few solutions:

Include orphaned pages in main nav
Include orphaned pages in sub nav
Include orphaned page in a HTML / XML sitemap
Link to the page internally

The solutions above will help search engines discover your content once they visit your website and begin crawling and indexing.
Once you have checked to make sure your sitemap is in the correct format, and there are no meta no index tags present and you aren’t blocking search engines it is time to resubmit your website. First, go to your search console dashboard and click ‘crawl’ and from the drop down select ‘fetch as google’. This will let Google fetch the current page. It will also let you view it as Google would. Also, it gives you a breakdown of any issues. Copy and paste either a specific page in question or leave it blank to ‘fetch and render’ the home page.

fetch and render

 

If everything checks out then click ‘resubmit to index’ and select ‘crawl this URL and its direct links’. This will notify Google’s web crawler to revisit your website and crawl all its associated links and submit it to its index. Remember when building a website always remember to check no index tags, robots.txt files and correctly format your XML sitemap. 

Until next time.

Posted in SEO

3 comments

  1. Very useful thank you.

    One thing I would add is that Soft 404s often show ecommerce categories with no products. This could be a good way to identify empty pages on a large site as well as giving an indication on if your site is close to the dreaded Panda.

    I found the point you made about old sitemaps. When doing a site migration I tend to just replace the sitemap URL. Do you suggest that I give the sitemap a different URL and let the old one 404 after the redirects have been crawled?

    1. I appreciate the feedback, yeah I find Soft 404s pretty frustrated personally. When I do a site migration I tend to keep the original URL for the XML Sitemap and just make sure all my appropriate one to one redirects are in place. I think as long as you don’t include 301 redirects in the original XML sitemap you should be fine.

  2. Sometimes, I receive a 404 error of a page that never exists, like index.asp.

    Then oftentimes, the 404 I get is from pages index.html and some old pages.

    But most of the errors are from pages that never exists.

    Have you experienced this?

Leave a Reply

Your email address will not be published. Required fields are marked *