SEO Newsletter | Volume 66 | April 15, 2009
BruceClay.com.au

SEOToolSet® Training Comes to You

We're very pleased to present Bruce Clay, Inc.'s famous SEO training course with new East Coast style. That's right, we will now be holding our SEOToolSet Training in New York! All of Bruce Clay, Inc.'s normal SEO goodness, now with tastier bagels!

And as always, there is Bruce Clay, Inc.'s SoCal search engine optimisation training sessions. You can register for the standard SEOToolSet Training here in sunny Southern California. Our next class takes place May 11-13 for the Standard class, and May 14-15 for the Advanced, so make sure you sign up soon as space is limited.


FEATURE: Branding, SEO & The Vince Update

Over the past couple of months there has been a lot of discussion in the SEO world about Google's "Vince" update. Although the change occurred in the second half of January, the discussion in the wider SEO community did not begin until Aaron Wall posted a blog entry on February 25th about Google seemingly placing "heavy emphasis" on branding in their search results. Google, through Matt Cutts, confirmed that an algorithm change did occur. However, Matt stopped short of calling it an update, labeling it instead a "simple change".

Regardless of what you call it, a change did occur in January to Google's algorithm and it did affect the rankings for some queries. But what changed? Does Google prefer branded sites now? And most importantly, is there anything that you should do differently for SEO?


BACK TO BASICS: XML Sitemaps Defined, Part One: Pairing Traditional Site Maps with XML Sitemaps

At Bruce Clay, Inc., one of the things that we struggle with in our everyday client work is getting the pages that we consider important into the search engines. In order to provide the search engines the most complete look at what is in a site and give those pages the best chance to be indexed, it's necessary to create maps of your site's pages. There are two ways of doing this. The traditional way is to create an HTML site map but in recent years search engines have developed the XML Sitemap protocol to assist in spidering. Google in particular has embraced the format and created several variations that enable them to easily discover all manner of content varieties.

The first part of this two-part series merely outlines the basic differences between HTML site maps and XML Sitemaps and gets you started on creating your own XML Sitemaps. The second will tackle the variations that Google has created in order to better help you serve them content.


Hot Topics
Local Results Coming to Your SERPs

This month Google began showing local business results even when users had not specified a location. Queries that suggest a local intent, such as restaurants or dry cleaners, are now being served up with a local 10 pack. Google Software Engineer Jim Muller explained that the new feature is available to users across the world and that when no local modifier is included in the search terms, the local results are shown toward the middle of the page rather than in the top spots.

How It's Done

Google determines the location of a user by matching the IP address to a broad geographical location. Users can also specify their location by selecting "Change location" from the top right hand corner of the local business results. According to Muller, Google has targeted more than hundreds of terms to trigger the local results. He also explained that the results may display in a number of configurations - including groups of ten, groups of three, in a stand-alone format or without a map - with the display determined by what the algorithm finds most useful.

Matt McGee's observation of the local results indicates that triggering may depend on whether the term used is singular or plural. Furthermore, commercial terms are not alone in triggering local results.

How It Affects the Search Industry

As McGee points out, this development may lead to a change in searcher behavior. Where we previously saw a move toward longer search queries, we may see a return to shorter queries as generic terms bring up more relevant results. He also explains that because results are based on a searcher's IP, a site's ISP may have an effect on the order of local results.

In her post on Search Engine Guide, Miriam Ellis says that public awareness of local data will increase as the 10 pack appears more often. Spammers will also take advantage of the newly exposed platform in the shape of increased mapspam. And of course, those businesses ranking well in local results will see incredible growth in search traffic. As Greg Sterling points out, through this new feature Google has acknowledged that geomodifiers aren't required for a query to be inherently local.


Shuffles

There's a lot of movement in the search and Internet technology industries this month. Microsoft has reported that its Live Labs group has been cut in half with the remaining team focusing on search. Microsoft's Encyclopedia Encarta was also bid farewell.

At Yahoo, the loss of two toolbar deals threatens to cost the search engine up to 15 percent of search traffic. Time Warner may be considering the sale of AOL.

Google announced the launch of a venture capital fund, Google Ventures. Dennis Woodside was named as Google's senior vice president and president of American operations. Meanwhile, the company's president of Asia-Pacific and Latin America operations, Sukhinder Singh Cassidy, left to become the CEO in residence of venture capital firm Accel Partners.

Google's lead designer Doug Bowman left the search engine to join Twitter. And Web designer Matthew Inman left NextC, announcing his plans to launch an entertainment Web site.

This month, Facebook's membership climbed to 200 million. Following CFO Gideon Yu's departure, speculation spread on why senior management is short-lived at the organization. At the third-largest social network, hi5, 50 percent of the 100-person staff was laid off.

Wikia Search ended its user-generated search engine service this month after founder Jimmy Wales decided it would be years before it could be used by the public - an unprofitable trajectory in this economy. Video site Veoh replaced CEO Steve Migang with founder Dmitry Shapiro and laid off about a third of its staff. Silicon Valley's former bellwether Silicon Graphics Inc. was bought by Rackable Systems for just $25 million.

IBM and Sun Microsystems were in merger talks that failed this month. IBM also made headlines for its new strategy of sending employees overseas.


Sound Bytes

If you like what you've read in the SEO Newsletter, there's more Internet marketing expertise where that came from. Check out SEM Synergy every Wednesday at 3:00 p.m. Eastern and Noon Pacific on WebmasterRadio.fm. Bruce Clay and the other hosts discuss industry news, SEO tactics and marketing trends, while expert guests share their insights on methods, best practices and upcoming events. Check out the show schedule below for a look at recent shows and upcoming topics.

April 1
(Listen Now)

Interest-Based Ads

David Szetela

Interest-Based Search

April 8
(Listen Now)

The Vince Update

Jayme Westervelt

The Vince Update

April 15
(Coming Soon)

IM Spring Break

Danny Sullivan

Industry Conferences

April 22
(Coming Soon)

Social Media Links

Jordan Kasteler

Social Media Dos and Don'ts

April 29
(Coming Soon)

SEM Recession Trends

TBA

Value-Added Service

Got something to say? Contact the SEM Synergy team and share your thoughts, comments and questions. You might even hear your question answered on the show.


Shindigs

From April 21-23, ad:tech will take place in San Francisco, featuring a new SMX @ ad:tech track on the second day. SMX Advanced is scheduled for June 2-4 in Seattle.

PPC Summit will be held in Chicago April 22-23 and in New York City May 13-14. San Jose hosts eMetrics Summit May 4-7.

Forrester's Marketing Forum takes place in Orlando April 23-24.

Online Marketing Summit will be held at a number of cities across the country, including Boston, Philadelphia, Washington D.C., Dallas, and many more, throughout the month of May.

SEOToolSet Training courses are scheduled around the country. Next up:

  • April 28-30 (standard) in Long Island, NY
  • May 11-15 (standard and advanced) in Simi Valley, CA
  • June 9-11 (standard) in Long Island, NY

Attaboys

Giving business owners a voice, business review site Yelp will soon let business owners publicly respond to reviews. Also raising his voice, Jeremy Schoemaker of Shoemoney.com has filed suit against a Google employee who is wrongly using Schoemaker's trademark within AdWords copy after accessing his account to gather competitive data.

Microblogging site Twitter made several changes this month, among them: adding advertisements, adopting a new layout and including a "Save this Search" feature. Social news site Digg launched an improved search engine with filtering capabilities.

A new BlackBerry app store, BlackBerry App World, was unveiled this month, featuring free and paid apps for RIM devices. Mobile ad network AdMob launched Download Exchange, a program that lets developers promote their apps in exchange for running ads for other apps.

Improved analytics and reporting were added to the Google TV Ads and YouTube Insights services. YouTube, Picasa, Flickr and Yelp content can now be previewed in Gmail.

Google search results are now showing the local 10 pack for a number of broad, non-geo-specific queries. Google local news was also expanded to the UK, India and Canada and Google Suggest went international with 51 languages.

This month Google announced a partnership with the pharmacy chain CVS, allowing customers to import their prescription history into Google Health and contribute to the comprehensive pharmacy history being collected.

Google Image Search released a new feature that lets users filter results by color. Yahoo Image Search was updated this month with a friendlier interface displaying larger images in search results and a preview page that gives users a better idea of how the image is used on the page. An array of new filters for image search such as size and color were also released.

Yahoo's open-source search technology, BOSS, announced three new capabilities: Delicious content, additional languages and news sorting.

Ask.com has partnered with Autism Speaks in an effort to raise funds and awareness during April, Autism Awareness Month. The search engine also announced support for Sitemaps autodiscovery protocol.

Microsoft Virtual Earth added interactive capabilities that allow businesses to add images, video or text content to locations on the map. The company will also allow Windows 7 users to downgrade their operating system if discontent with the new OS.


Word on the Wire

New statistics were released over the last month regarding online advertising. According to one survey on social network ad engagement, 74 percent of young consumers reported clicking on social network ads infrequently and 36 percent said they didn't click on social network ads at all.

The number of online network viewers who reported they would welcome ads in exchange for free video content rose from 67 percent in 2006 to 80 percent this month. Sales of YouTube ads grew 50 percent, from placement on 6 percent of videos in 2008 to inclusion on 9 percent of videos this month.

Meanwhile, Google AdSense dropped its video ads program, citing poor earnings. Online TV viewing site Hulu has seen a growth in viewership and a decrease in advertisers. In a year-over-year analysis, online ads grew at the remarkably slow rate of 2.6 percent in Q4.

Twitter began preparing a revenue model based on charging for enhanced commercial accounts. According to comScore, Twitter's user demographics skew older, with the 45-54 year old age group making up the largest segment, followed by 25-34 year olds. It was reported that CEO Evan Williams wouldn't let go of the company's independence for less than $1 billion. Later, however, rumors surfaced that Google and Microsoft were fighting over a Twitter acquisition.

In an interview, Microsoft CEO Steve Ballmer said that the company's underdog status in the search arena allowed Microsoft to be more experimental than Google in developing new technologies and features. It was also reported that Microsoft planned to spend up to $100 million in order to take search market share from the two top-tier engines.

The release of Internet Explorer 8 was met with protest due to its adoption of Web standards unlike previous versions of IE. There were also reports that Yahoo and Microsoft were in discussions on a search and advertising partnership.

Google redesigned the Sitelinks shortcut feature as One-Line Sitelinks. Google Maps Street View is now available in 25 new cities in the Netherlands and the UK. And Google News officially revised its URL policy, no longer requiring that numbers be included in URLs. Eric Schmidt responded to publishers' cries for preferential treatment by explaining that trusted brands already see a rankings boost in Google News.

Privacy advocates are making noise about the dangers of cloud computing. In New Jersey, lawmakers are considering legislation which orders social networking sites to police for offensive posts. A European Union directive mandating the archival of Internet traffic for 12 months has been put into effect in the UK.



If you have any questions or comments on any of the articles above or if you would like to suggest topics for future search engine optimisation articles, please contact us at Bruce Clay Australia


SEMToolBar

Search Engine Optimisation/KSP

SEM Synergy Radio Show

Search Engine Optimisation SEO Training Courses in California

Search Engine Optimisation SEO Training Courses in New York

SEO Tools: Free Trial

Bruce Clay Search Engine Optimisation

 

FEATURE: Branding, SEO & The Vince Update

by Fernando Chavez, April 15, 2009

Over the past couple of months there has been a lot of discussion in the SEO world about Google's "Vince" update. Although the change occurred in the second half of January, the discussion in the wider SEO community did not begin until Aaron Wall posted a blog entry on February 25th about Google seemingly placing "heavy emphasis" on branding in their search results (http://www.seobook.com/google-branding). Google, through Matt Cutts, confirmed that an algorithm change did occur. However, Matt stopped short of calling it an update, labeling it instead a "simple change".

Regardless of what you call it, a change did occur in January to Google's algorithm and it did affect the rankings for some queries. But what changed? Does Google prefer branded sites now? And most importantly, is there anything that you should do differently for SEO?

Despite the ranking shifts that were pointed out in Aaron's original post, I don't feel that we should assume Google intentionally increased the rankings of branded sites. I think that the most reasonable explanation is that they made a change that was intended to eliminate the effectiveness of certain spam. At the same time, they may have increased the relevancy value of certain SEO factors that would have influenced big brands as well. The side effect of two such tweaks could very easily be an increase in rankings for branded sites for certain keywords.

Although I spent an entire weekend contemplating the issue, I came back to the office on Monday confident that something more logical (at least in my mind) was in play. Whenever I'm debating a Google algorithm issue, the question I always ask myself is what makes most sense from a search quality standpoint. In the case of branding and its potential effect on rankings, does putting an emphasis on brands improve Google's results? I would argue that it doesn't. While emphasizing certain brands might be more helpful for some searches, it certainly would not make the results better for every potential query out there. And Google will rarely make such an algorithm update unless they are confident that it will improve search results across the board.

Not Update for Brands

I got validation for my initial conclusions while doing research for a segment that I recorded with Susan and Virginia on SEM Synergy (jump to the last six minutes). During my research, I came across a video that Matt Cutts made as part of the GoogleWebmasterHelp YouTube account that was created at the beginning of this year (http://www.youtube.com/user/GoogleWebmasterHelp). In the video, Matt answered a question from a user about the tweak mentioned in Aaron's blog. Watch it now if you haven't already.

As is typical of Matt and everyone at Google, he was not specific in his answer. However, he did say a few things that I think are noteworthy. Here are several quotes that I found particularly interesting (in order):

"We don't really think about brands. We think about words like trust, authority, reputation, PageRank, high-quality ... "

"And so the Google philosophy on search results has been the same pretty much forever."

"But it affects a relatively small number of queries ... it's not like it affects a ton of long-tail queries."

"I don't think of it as putting more weight on brands. We really don't think about 'brands' in search-quality that much."

"But it's not that we always try to return brands, we try to return whatever we think the best results are for users."

"And so what you should be doing doesn't change."

Conspiracy theorists will say that of course Google is not going to say that they do not think about brands. They wouldn't tell us what they did, right? Although I agree that Google is not going to say what they did, I've also found in my experience that they don't directly lie. They are often vague when they don't want to tell us something, but I've never seen evidence of them outright lying. The quotes above are fairly direct in my opinion, at least when it comes to branding.

Other Possible Explanations

So if Google didn't set out to improve the rankings of branded sites, what did they do exactly? I can think of two relatively simple tweaks that could have been made that would have improved the rankings of established brands:

  • They increased the relative value of link age and/or domain age. Well-known brands in general have been online much longer than a lot of other sites. Thus, many of the links going to those sites would be older, increasing their relative value compared to links to newer sites.
  • Google may have increased the value of text content that's on the linking page outside of the anchor text. They may have also slightly decreased the value of anchor text. It's also possible that they did both. The end result of any of these three scenarios would be that branded sites that had a lot of links from relevant pages but bad anchor text would have seen a bump.

Both of these tweaks would be pretty common and are quite possible. Another scenario that is slightly more far fetched is that Google has now found a way to associate certain words that appear together often, even if there is no link going to any particular domain. For example, let's consider one of the examples mentioned by Aaron in his post. Radio Shack's Web site started ranking for the phrase "electronics" after this change was made. Interestingly, it now ranks #1 for that keyword.

The two possibilities I mentioned above would have helped Radio Shack. However, by doing some searches in Google you can see that "electronics" and "Radio Shack" have been used an extraordinarily large number of times on the same page. Take a look at the following Google queries:

"radio shack" electronics -site:radioshack.com (≅2.5 million results)
intitle:"radio shack" intitle:electronics -site:radioshack.com (≅4,000 results)

The first query returns all the pages in Google's index on which "radio shack" and electronics both appear on the page. The second query returns all the pages in Google's index that have "radio shack" and the word electronics in their Title tags. The number of results for both of these results is pretty high considering I eliminated pages from the Radio Shack Web site.

The interesting thing that I found when I reviewed a lot of the results from these queries was that many of them did not actually link to www.radioshack.com. This tells me that Google is possibly viewing these terms as related or synonymous in some way and thus the Radio Shack site is getting a bump for that query. I can't think of how it could be programmed algorithmically, so it's hard for me to say that's exactly what's going on. However, that would certainly explain a jump from nowhere in the top 10 to number 1 in the last few months. The ranking factors I mentioned would have had to have been turned up tremendously in terms of relevancy value to account for that type of jump.

Assuming this type of association is what is coming into play, it could explain why Matt would have said that the change "affects a relatively small number of queries ... it's not like it affects a ton of long-tail queries." It's much less likely that Radio Shack would appear so often with long-tail keywords. Thus, they wouldn't start ranking for any long-tail terms as a result of the change.

Change in SEO Strategy?

Regardless of what was actually changed by Google, the most important thing to do is determine if the change alters your SEO strategy. I always ask myself the following question when I learn about something Google tweaked: does this change what I do? Nine times out of ten, it does not.

There aren't too many things that will have a significant effect on your site's rankings. Increasing internal relevancy is as simple as tweaking Title tags and non-hyperlinked body content and combining that with optimal internal linking. Once you've finished that process, you'll want to extend upon your strategy by adding inbound links from related third-party sites. When it comes to the SEO for a particular keyword, there's not much else to it. The only thing that changes is how effective each of these things is. But regardless of how much Google discounts these things by tweaking their relative value, these core factors are still the shortest path to optimal rankings.

This recent update does not give me any more reason to recommend branding than I did before. Successfully creating a brand for yourself or your company should always be a goal because it is good marketing practice. However, SEO should never be your main purpose. Keep in mind that SEO benefits would only come from successful branding, not any attempt at branding that you throw out there. In my estimation, the only way branding can possibly affect your site's rankings for a particular query is if your brand becomes essentially synonymous with the search query. So much so that the keyword you are targeting appears on Web pages with your brand name throughout Google's index. But that is a long-term byproduct of successful branding that will happen naturally. You won't be able to make it happen.


For permission to reprint or reuse any materials, please contact us. To learn more about our authors, please visit the Bruce Clay Authors page. Copyright © 2009 Bruce Clay, Inc.

 



BACK TO BASICS: XML Sitemaps Defined

Part One: Pairing Traditional Site Maps with XML Sitemaps

By Bradley Leese, April 15, 2009

At Bruce Clay, Inc, one of the things that we strive to do in our everyday client work with is getting the pages that we consider important into the search engines. In order to provide the search engines the most complete look at what is in a site and give those pages the best chance to be indexed, it's necessary to create maps of your site's pages. There are two ways of doing this. The traditional way is to create an HTML site map but in recent years search engines have developed the XML Sitemap protocol to assist in spidering. Google in particular has embraced the format and created several variations that enable them to easily discover all manner of content varieties. The first part of this two-part series merely outlines the basic differences between HTML site maps and XML Sitemaps and gets you started on creating your own XML Sitemaps. The second will tackle the variations that Google has created in order to better help you serve them content.

HTML Site Map Page(s)

A traditional HTML site map communicates to site users and search engine spiders how the site's information is organized. Essentially, the purpose of the site map is to document the site content relevance. If a site owner identifies errors when attempting to match the site map to the structure of the site and reveals that the information is confusing or in an unempirical format, the site needs to be reorganized in order to provide clear subject expertise. HTML site maps are extremely important for usability. Your visitors will find your site map when they can't use or don't understand your navigation.

Take a moment to view Google's site map for a clearer understand of their recommendations in their Webmaster Guidelines section.

http://www.google.com/sitemap.html

Webmasters

Snippet from Google HTML Site Map

Site Map Restrictions

HTML site maps' purpose was always to lead the search engines to identify and (hopefully) conclude that the site's navigation and content were proof alone that the site was worthy of high keyword rankings. There are many limitations with the HTML format, not least of which is the somewhat restrictive format that Google outlines in its webmaster guidelines:

Design and Content Guidelines
  • Make a site with a clear hierarchy and text links. Every page should be reachable from at least one static text link.
  • Offer a site map to your users with links that point to the important parts of your site. If the site map is larger than 100 or so links, you may want to break the site map into separate pages.
  • Create a useful, information-rich site, and write pages that clearly and accurately describe your content.
  • Think about the words users would type to find your pages, and make sure that your site actually includes those words within it.
  • Try to use text instead of images to display important names, content, or links. The Google crawler doesn't recognize text contained in images.
  • Make sure that your <title< elements and alt attributes are descriptive and accurate.
  • Check for broken links and correct HTML.
  • If you decide to use dynamic pages (i.e., the URL contains a "?" character), be aware that not every search engine spider crawls dynamic pages as well as static pages. It helps to keep the parameters short and the number of them few.
  • Keep the links on a given page to a reasonable number (fewer than 100).

http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=35769

Large sites will have trouble fitting their entire site within the confines of Google's Webmaster Guidelines and documented above and, of course, Bruce Clay writers have long offered solutions on how to combat these obvious issues. Following the design guidelines exactly leads to either not listing every page on your site in your site map, or creating nested site maps that may not be crawled entirely. Either way, most site maps are just pages of links with good anchor text but very little in the way of content.

The most important thing to know about using site maps successfully is that Google does not expect you to list every page on your site within your site map. Now large site owners will scoff at this statement with loud pronouncements that this is obvious and everyone should know that if they are an Internet marketer. However, this recommendation might give you pause. After all, wouldn't it be wise if you could document all the pages on your site despite their volume to verify that the search engines have access to site information? The answer is yes, and that is where the Sitemaps Protocol comes into play.

Sitemaps (XML)

XML Sitemaps - usually called just Sitemaps - are a way for you to give the search engines information about your site. There are three ways to point search engines to your XML Sitemap. You can use your robots.txt file, you can submit it directly to the engines using their submission forms or you can issue an HTTP request to the URL provided by the search engine. You can find how to build your Sitemap, and the full Sitemaps protocol at http://sitemaps.org.

Sitemaps (XML) Page(s):

  • Allows Google, Yahoo and MSN to locate all the site's content
  • Informs the search engine when content changes
  • Shows spiders deep content that is unable to be found in a normal crawl process
  • Tells the spider the date the content was last modified
  • Assigns the importance or priority of information

In some cases, Sitemaps are helpful if your site has dynamic content or pages featuring technologies like AJAX or Flash that might not be easily found and crawled during a normal spidering process. While the search engines have increased leaps and bounds in their ability to follow links in Flash, supplementing those links with an XML Sitemap can make your life easier.

An XML Sitemap is also useful if your site is new and has few links to it or if your site has a large archive of content pages that are not well linked to each other, or are not linked at all. Because Google and the other search engines discover new pages by following from link to link, poorly or under-linked pages may have a harder time getting spidered and indexed. An XML Sitemap provides those URLs to the search engines directly so that they can spider them and consider them for indexing.

Using a Sitemap provides additional site information to Google, which complements Google's normal methods of crawling. Sitemaps allow Google to crawl a Web site in a much more timely fashion. Google does state, however, that there is no guarantee that URLs from a Web site's Sitemap will end up in the Google index. Web sites are also never penalized for submitting Sitemaps.

Dynamic XML Sitemaps Do Not Replace Static Site Maps

After pointing out the differences between these two site map / Sitemap formats it may seem the traditional format is obsolete. This is, however, explicitly incorrect. Traditional site maps and XML Sitemaps work best when paired together, creating a stronger and fuller picture of your site for the search engines. In addition, the HTML version will be useful for your human visitors as well, since they might use the site map pages to navigate their way around the site if your global navigation is confusing or broken.

Sitemaps are for Spiders and Site Maps are for Silos

It is vital that this seemingly minor distinction is taken into consideration. XML Sitemaps do help prioritize which sections of the site are silos and which sections are supporting or sublevel content sections; however, they do not explicitly silo. Only site maps (HTML) succeed in this task when properly nested throughout the Web site. Consider that Google requires link text in order to clearly identify silos and subdirectory content.

Site maps (HTML) are cached by search engines, which means they'll show up in the search results, something that could be useful to your site in the long run.

http://74.125.95.132/search?q=cache:http://www.google.com/sitemap.html&hl=en&rlz=1C1GGLS_enUS291US305&strip=1

Google Sitemap

XML Sitemaps on the other hand are only a batch of links to be followed. They're not human readable (unless the human is particularly fond of reading code) and they won't pass any link equity. They are strictly for search engines. The upside is that because you don't have to worry about any pesky human eyes, the code can be extremely efficient, and things like font size and text content are not a concern.

http://www.google.com/hostednews/sitemap_index.xml

It's clear that the best way to build up your site is to make both a traditional HTML site map as well as an XML Sitemap. You can add your XML Sitemap through Google Webmaster Tools.

Google Webmaster Tools

Sitemap Guidelines

Sitemaps all adhere to the same general guidelines; a Sitemap may contain a list of URLs or a list of other Sitemaps. If a Sitemap does contain a list of other Sitemaps, it can be saved as a Sitemap index file using the XML format provided for that file type. For those with larger Sites, be aware that an XML Sitemap index file cannot contain more than 1,000 Sitemaps.

There are also size restrictions for URLs and file sizes in a Sitemap file. A sitemap file cannot have more than 50,000 URLs and be no larger than 10MB when uncompressed. If a Sitemap has more than 50,000 URLs or is too large, it can be broken into several smaller Sitemaps. These limits make sure that the Web server is not overloaded by large files.

Just like the best practice for linking within your site, all URLs in your XML Sitemap must also be referred to the same way every time. If a site specifies its site location as http://www.peanutbutterville.com/, the URL list should not contain URLs that begin with the non-www version, http://peanutbutterville.com/. Likewise, if the site location is named as http://peanutbutterville.com/, the URL list should not contain URLs that begin with http://www.peanutbutterville.com/.

Direct image URLs should also not be included in the Sitemaps as Google indexes the page the image appears on, not the image itself directly. If your URLs include session IDs make sure that you strip those out for the XML Sitemap. The Sitemap URL must also be readable by the Web server where the Sitemap is located, and may only contain ASCII characters. An XML Sitemap containing upper ASCII characters, certain control codes or special characters such as * and {} will receive an error and can't be added.

It is possible to create a specialized Sitemap for certain types of content. However, certain Sitemaps are only accepted by specific search engines. Next month, the second article in this series will be covering the following types of Sitemaps which are specific only to Google, so Yahoo! and Microsoft Live Search and Ask won't be able to read them:


For permission to reprint or reuse any materials, please contact us. To learn more about our authors, please visit the Bruce Clay Authors page. Copyright © 2009 Bruce Clay, Inc.