Common Questions Viewing Source Can Answer

Now that you know some of the key data points to look for when you are viewing source, it’s time to consider some of the questions viewing source can help you to answer.

Is This Page Not Getting Indexed Due to On-Page Errors?

This is a fairly common situation. The key things to look at when trying to diagnose this are:

  • Erroneous use of meta robots
  • Use of Flash
  • Use of frames
  • robots.txt (not on the same page)

As HTML matures, the on-page giveaways of Flash (sounds and
animation) will become less obvious. The surest way to see what is going
on is to view source. When trying to answer this question, you simply need
to look for an <embed> tag or <object> tag with an attribute that points to either
adobe.com or macromedia.com. If you find this, the Flash-based piece of
content is not being parsed by the search engines as easily as it could be
if it was written in HTML.

Is That Piece of Content in a Frame?

This is very important to avoid and easy to diagnose. Simply search the
source code of the page for the <frameset>, <frame>, or <iframe> tags. Frames
can be useful for some situations (as in Gmail and checkout processes),
but they are almost never a good implementation for pages that depend on
search engine–referred traffic.

Are These Navigational Links Passing Juice?
As discussed in Chapter 2, site architecture starts with the homepage. I
find myself viewing source a lot to see how global navigation is
implemented. From an SEO perspective, the best implementation of
navigation uses HTML lists and Cascading Style Sheets (CSS). When
done well this looks like the following:

<ul>
<li id=”example-1″><a href=”http://www.example.com/”
title=”Example 1″>Example 1</a></li>
<li id=”example-2″><a href=”http://www.example.com/example-2.html”
title=”Example 2″>Example 2</a></li>
<li id=”example-3″><a href=”http://www.example.com/example-3.html”
title=”Example 3″>Example 3</a></li>
<li id=”example-4″><a href=”http://www.example.com/example-4.html”
title=”Example 4″>Example 4</a></li>
<li id=”example-5″><a href=”http://www.example.com/example-5.html”
title=”Example 5″>Example 5</a></li
<li id=”example-6″><a href=”http://www.example.com/example-6.html”
title=”Example 6″>Example 6</a></li>
</ul>

If the navigation takes this form and the meta robot is set up to pass
juice, then global navigation does pass juice. Notice that this code uses
normal HTML-based links that are easy to parse. If these were obfuscated
(that is, more complicated than necessary) with JavaScript or nofollows, the
given links would not pass juice.

Useful Search Engine Queries
The search engines have been gracious enough to give us special search
commands for understanding their vast amount of data. The commands
that I find the most useful are:

cache:
site:
inurl:
intitle:
+-|

These commands, when used in combination, are powerful and
sometimes prove essential for diagnosing SEO problems. To the search
engineers that created these commands, I send my sincerest gratitude.

You make my job much easier. (It should be noted that I have intentionally
left out the search engine commands that I don’t use. Because of this
decision, I recommend that you don’t treat this as a comprehensive list. It
details only the commands that I find essential for SEOs.)

These commands are useful for filtering search results to show only
pages that contain certain attributes. This means if you find a webpage
that has an issue like a misspelling in a title tag, you can use the search
engines to find all of the occurrences of this on your website and use this
information to fix the problem.

Another example of this is for checking the effectiveness of keyword
targeting. It is a common SEO problem to have multiple pages targeting
the same keyword. This is a problem because then all of these pages must
compete with each other for rankings rather than the best practice that
would have all of these pages combined and one more powerful page
competing for rankings. Figure 3-3 shows how these pages can be found
by limiting a search to only those pages on Google.com with the phrase lol
in the title tag of the document.

Figure 3-3: Image of combined search engine command query in Google

I use these search engine queries when I want to:
Search for duplicate content
Get a general idea for how well indexed a website is

Key Data Points Search Engine Commands Can Generate

You can query a search engine in the following ways to yield some useful
data points:

Normal search: What is the best way to see how the search
engines will act? Run a normal search. According to my web history I
search using Google about 17 times a day. This does not include the
internal searches I do on Google properties like Gmail and YouTube
or the searches I do on my phone. I have found that the best way to
better understand Google is to continually and constantly use it. After
all, our goal as SEOs is to improve our clients’ rankings. What better
way to do this than studying search results every day?

Quotes:
As I am sure you are aware, putting search queries in
quotes limits results to exact matches. This extremely helpful when
you want to see if a random page is in the Google index. Simply find
a random sentence in the content, wrap it in quotes, and search for it.
If it is long enough, odds are it has only been written once on the
Internet and should return only one result. If it doesn’t appear it means
it isn’t indexed. If it appears more than once, it means your client has
duplicate content issues.

cache:
Cache is a copy of the file Googlebot downloads when it
visits a website. As an SEO, this information is extremely important
because it shows you exactly what Google sees. This is especially
useful for determining crawl rate and diagnosing potential geolocation
issues.
When viewing the cached version of a website, try clicking the link
labeled “Text-only version.” This shows a much better representation of
what Google sees. I can’t count how many hidden links I have found by
using this trick.

One of my favorite examples of the importance of cache use was when
my former colleagues at SEOmoz were working with restaurant review
website, yelp.com. Yelp was implementing a complicated system of geolocating
based on IP addresses and cookies to automatically redirect
users to their applicable city version of yelp.com. For some reason, Yelp
was having issues getting results in Google. Upon checking the cache,
my co-workers saw that whenever Googlebot crawled Yelp, Yelp was
automatically taken to the Mountain View, California, version of the site
(home of Google headquarters). D’oh! After my co-workers pointed this
out, this problem was quickly resolved and Yelp’s traffic skyrocketed.

site:
The site command is used to limit a search query to a
specific site. This is extremely useful for diagnosing indexing
problems. I generally start by using the site command alone
(site:techmeme.com). This simple query can tell you two important
things:First, it gives you an idea of the major sections of a
website. It also gives you an idea of how many pages are
indexed in Google. If you know that a given site has only
100 pages, and this query returns 100,000 results, you
know you have a duplicate content issue.

Additionally, it makes you aware of some of the
subdomains on the given site. This is extremely helpful for
understanding how Google thinks a site is organized.
inurl: This command limits search results to those where the
query appears in the URL. This is most useful when combined
with the site command (site:www.seomoz.org inurl:”Rand Fishkin”).

Most SEO professionals find this technique most useful for
identifying URL parameter–induced duplicate content
(site:www.example.com inurl:”sessionid”). I use this after I identify a
problematic parameter and I want to find all of its occurrences.
intitle: Similar to the inurl command, the intitle command limits
results to only those where the query is in the title tag. This can
be helpful for many things including piracy (intitle:”index of mp3″),
vanity searches (intitle:”danny dover”), and SEO-related things
like duplicate title tag detection (intitle:”my company: Best product
ever page”).

+: The plus sign, when placed directly before a term, tells
Google to search for exactly that term, not synonyms. For
example, a search for ghw bush will return results that assume you
mean “George Herbert Walker Bush”. A search for +ghw bush,
however, will return results that assume you want specific
references to “GHW” in the results.

-: The minus sign is a tremendous aid to filtering queries, and it
can be used with specific query terms (cubs -chicago -baseball will
show you results for “cubs” that do not contain Chicago or
baseball) or in conjunction with specific operators discussed in
this section. Searching for “danny sullivan” –
site:searchengineland.com will return results about Danny Sullivan
that appear anywhere except for SearchEngineLand.com. This
operator works similarly to filter out title contents (music –
intitle:mp2) and URL contents (site:nytimes.com –inurl:pagemode=print
shows all indexed pages from nytimes.com that are not “printfriendly”
versions).

|: The pipe symbol symbolizes an “OR” search and can be used
with regular query terms or with the commands listed in this
section, primarily when you’re looking for multiple items within a
given dataset. For example, site:example.com
inurl:sessionid|jsessionid will find URLs that contain either
“sessionid” or “jsessionid” in indexed URLs from example.com.

Similarly, site:seomoz.org danny|rand will return pages from
SEOmoz.org that contain either “danny” or “rand” in the copy.
(Pages that include both “danny” and “rand” will also be
included with this operator, so it’s a true “and/or” operator, not
an “exclusive or” operator.)

The search engine commands in Google must be started with a
lowercase letter or they won’t work properly.