The Google Webmaster Toolset (GWT) is an excellent way of gathering Google-specific data about your site’s performance and finding potential obstacles to Google viewing the site correctly. Some reports contain data that is not particularly actionable, while others show errors that you can repair immediately for nearly instant gain. The following sections discuss GWT at a high level, followed by a report-by-report synopsis of issues that GWT discusses—and what, if anything, you should be looking for.
Key Data Points
Google Webmaster Central: This is an extremely important resource for all webmasters. It is essentially a control panel for websites in Google’s index. At the time of writing, this interface offered tools for moving domains, setting locality preference, checking common SEO problems, analyzing link profiles, and exporting important SEO data. I highly recommend that every SEO sign up for this tool and verify their clients’ websites. In doing so, they will likely gain new insight into their websites and how Google interprets them.
Common Questions Webmaster Tools Can Answer
These tools can help you answer the following questions.
How Does Google See My Site?
This is a big question, and the individual reports that follow in this chapter all combine to give a pretty comprehensive look at how Google interprets the content on your site.
How Do I Ask Google for Reinclusion If I Have Been Penalized?
Google considers its index and services private and reserves the right to exclude anyone for any reason. When Google search quality representatives find a website they believe is violating the Google
(www.google.com/support/webmasters/bin/answer.py?hl=en&answer=35769), they can either devalue applicable links, manually penalize a website’s ability to rank, or remove the website from the Google index altogether. Common examples of penalty inducing actions are buying and selling links, cloaking (showing search engines one piece of content and showing normal visitors different content), keyword stuffing, and manipulative redirects. Luckily, if you have a client who feels their site is being penalized unfairly, you can ask Google for reinclusion
NOTE To request reinclusion into Google’s index, go to www.google.com/webmasters/tools/reconsideration. Be
sure that whatever got you penalized in the first place is fixed, and be ready to explain how it happened. This mea culpa form might be uncomfortable to answer comprehensively, but it’s often the only way to get a banned site back into Google’s index.
Google allegedly manually reads every reinclusion request it receives.
From time of submission to time of action (assuming Google actually decides to act) is close to three months. It is best to check the site in question for any sign of breaking the Google Webmaster Guidelines before submitting the reinclusion request. Once you are sure your client’s site is search engine friendly and abides by the Google Webmaster Guidelines, you can submit a reinclusion request through the Webmaster Central dashboard. After doing so, knock on wood, throw salt over your shoulder, and do a rain dance for 7 days for luck. You are going to need all
of it that you can get.
Following is a deeper dive into Google Webmaster Tools reports. The section titles in the book correspond to the specific section names in the left navigation of Webmaster Tools. While some of the reports are fairly
binary in their explanation of your site (an XML sitemap is either valid or invalid, for example), much of the data is technical and won’t be easily labeled “good” or “bad” or offer specific recommendations. Instead, in most cases, Google simply shows you the data, and it’s up to you to interpret and act on it. The purpose of the following sections is to help you decide which reports are critical to watch and the thresholds at which you should take action on improving various aspects of your site.
The GWT Dashboard gives a very quick snapshot of your site’s performance, showing top-level data for the following issues:
Top search queries
Crawl error types and their counts
Links to your site
Each of these report snapshots has a link to its full report counterpart, which is discussed in the following sections.
The dashboard is best for quickly spotting anomalies, such as spiking search queries or crawling errors and spotting signs that your XML sitemaps are invalid.
This page lists messages from Google to you, specifically about your site. Messages include notification of new verified owners to the site, changes to Sitelinks, and important notifications if your site is harboring malware or
potentially running afoul of Google’s quality guidelines. Check this area at least once per week for each of your clients’ sites.
The Message center in the left navigation of GWT shows message for specific domains. If you have several sites verified in GWT, it’s better to view all messages at the main hub of GWT, www.google.com/webmasters/tools/home. This page aggregates all messages about all domains in your GWT portfolio.
The Site Configuration reports show how Google interacts with your site. The following sections discuss specific signals that your site and Google send to each other to optimize the user experience.
The Sitemaps report shows each sitemap that you (or anyone else who is verified for the site) have submitted, as well as most recent fetch date, status (either valid or invalid), and the type (general, mobile, and so on). It also shows the number of URLs indexed, contrasted with the number of URLs in your XML file. This is a helpful way to see what percentage of your sitemap’s URLs is actually making it into Google’s index. Finally, you can submit sitemaps from this report, too, provided the actual files are on your domain’s server. (In other words, you cannot upload
sitemap files from your computer.)
This multi-featured report enables you to perform and diagnose all sorts of robots-related crawling rules for your site: Test your robots.txt file: This pane shows the HTTP status code of your current robots.txt file and the last time it was downloaded. The “Parse results” section, if it appears, goes through your file line by line and identifies any sitemap locations you’ve declared in your file, explains any disallow or user agent lines you’ve added, and
identifies any coding errors. To test hypothetical changes to your robots.txt file, change the content of the “Text of
http://www.example.com/robots.txt” field, add specific URLs to the URLs field, select a specific user agent, and click the Test button. The result will tell you whether your hypothetical changes will disallow the sample URLs you entered.
Hypothetical because this report does not literally change your robots.txt file. It’s simply a testing sandbox, and you must manually edit and re-upload your robots.txt file for actual changes to take effect.
There are a few things to watch out for while testing your robots.txt file. First, when you declare a sitemap location, a result of “Valid Sitemap reference detected” means only that the location of the file is valid, not necessarily the file
itself. In other words, the URL that you gave as your sitemap location does exist. To know whether the XML is valid, you need to check the Sitemaps report. If your robots.txt file was encoded as UTF-16, your “Parse results” section might show you a question mark as the first character in your file. This is a byte-order mark (BOM), and it usually renders the robots.txt file’s first line incomprehensible for Google. Resave as UTF-8, re-upload, and you should be fine.
Generate a custom robots.txt file: This pane will help you write a custom robots.txt file based on the actions, specific robots, and specific directories and/or files you want to control access to. Keep in mind that for any non-Google crawlers, you need to come prepared with the robot’s user agent name. When you’re done feeding it the roles, Google will create and let you download a file tailored to your needs, which you’ll then need to upload to your
If you have certain URLs that appear in SERPs and you need them out of the index sooner than a 404 will accomplish it, use this tool. However, before it will work, you must first show Google (through a robots.txt file, meta robots tag, or 404 header code) that the content should not be indexed. Remember that this tool simply
removes URLs from Google’s index; it does not remove them from your server.
This section shows the Sitelinks that Google has bestowed on your site for the home page and possibly other interior pages too. From this report, you can block individual Sitelinks so that they no longer appear on SERPs.
Blocking a specific Sitelink does not remove the specific URL from the Google index; it ensures only that for queries in which Sitelinks appear, that specific link will not appear on the SERP.
Weigh your options carefully while deciding whether to block Sitelinks. Remember that while you can block any Sitelinks you want, you cannot tell Google what link you would like Google to show in its place, and you can’t even ensure that it will show anything at all. For example, if you block two of eight Sitelinks, Google may replace one or
both Sitelinks with different Sitelinks, or it may simply show the remaining six.
Change of Address
The Change of Address tool is a supplement used when you’re moving your site to a new or different domain. It does not take the place of oldfashioned 301 redirects from your old site to your new one, but it represents an additional signal for Google to help process the migration and is supposed to make the SERP transition faster for new URLs.
To use this tool, you must have both old and new domains verified through GWT and set up the 301s ahead of time. After those items are complete, you can use this tool to select the new domain that you’re moving to.
Currently, the Change of Address tool works only for root-level domains. In other words, your old and new sites must be either the “www” version or have no subdomain at all to be eligible for this tool to work.
The Settings area has two sections:
This tab lets you send Google three important signals about your content:
Use this to have URLs from your site appear in search results for only one country. The default is unchecked, which means your site can, in theory, appear for results in any country.
Use this section to tell Google whether you prefer the “www” or non-“www” version of your URLs to appear
in SERPs. This report is overridden by more overt actions like 301 redirects you perform yourself.
This tool tells Google that you prefer its robot crawl your site at a rate faster or slower than it currently is.
Use this tool with caution, as an amped-up Googlebot can take down a server if it hits the server at full speed.
This tab lets you pick dynamic URL variables from your site and tell Google to ignore those parameters when it
crawls. For example, if your site has URLs /authors.php and /authors.php?sortby=lastname, you could tell Google to ignore the sortby parameter, which would help canonicalize any URLs that contain that parameter back to /authors.php. Google also lists dynamic URL parameters that Google suspects might be meaningless enough to create duplicate content on your site, but honestly, Google doesn’t bat too well on this, often suggesting
important parameters that create unique content.
Your Site on the Web
The “Your Site on the Web” reports show Google’s interpretation of how users and other sites relate and interact with your site, including such metrics as query and linkage data. Following are the reports and how to get the most from them.
This report shows queries for which your site appears in search results, along with estimates about how many times you get the click (CTR, or clickthrough rate). It also shows your page’s “average position,” which is the average rank for your site for a given query. While the impression and click numbers appear to be rounded estimates, they provide good insight into areas in which a little extra focus can provide additional traffic.
For example, you might have very low clickthrough for a high-demand query term in which your page’s average position is 8. That makes sense, because anything ranked 8th will naturally receive a low percentage of clicks from a query. But with a little work, you can improve that page’s ranking for the query and capture more of the clicks. This report helps you determine the high-return keywords, where the effort will provide the most return.
Links to Your Site
This report is valuable intelligence about the external sites that link to you most frequently, as well as the pages on your site that receive the most links. Like a lot of Google linkage data, you never quite know whether it’s
showing everything the engine knows about, so you may not be getting the precision that you’ll see from tools like Open Site Explorer.
This report is conceptually very simple. It’s the list of words, organized by frequency, that Google finds when crawling your site. Your top words should, therefore, be the terms that your site is focused on, whether
category or specific products or services, along with some brand-focused phrases.
Click a term and you’ll see the list of pages that contain that term in greatest numbers, which is helpful for noticing pages that might refer to a specific term too often or not enough.
This report lists your site’s URLs in order of the number of internal links (that is, links from your own domain) pointing to them. This data is useful to help you ensure that your critical content is linked to more often than your
non-critical content, which helps you manage the flow of PageRank and authority through the site.
In addition, this tool is a helpful way to spot duplicate content, because if you have two duped URLs (differentiated only by capitalization style, for instance), they will likely both show up on the report and help lead you to the page that is linking to the incorrect version.
There’s not a lot you can do with this information. It lists the number of Google users who have subscribed to feeds on your site using a Googlebased RSS reader, such as Google Reader or iGoogle. If you run a feed on Feedburner, your subscribers won’t be reflected here, because this tool reflects only subscribers to feeds on your own domain, not