Techniques for Preventing Duplicate URLs in Your WebsiteBy Scott Mitchell
Chances are, there are several different URLs that point to the same content on your website. For example, the URLs
http://www.yoursite.com/default.aspxare all likely valid URLs that results in the same content, namely the homepage for
yoursite.com. While having four different URLs reference the same content may not seem like a big deal, it can directly impact your website's search engine placement and, consequently, it's traffic. To a search engine, those four different URLs represent four different pages, even though the all produce the same content.
To understand how allowing duplicate URLs in your website can affect your search engine placement, first understand that search engines base a page's placement in the
search results based, in part, on how many other websites link to the page. Now, imagine that there are 1,000 web pages from other websites that link
to your homepage. You might conclude, then, that a search engine would rank the importance of your homepage based on those 1,000 links. But consider what would happen
if 25% of those links linked to
http://yoursite.com, 25% to
http://yoursite.com/default.aspx, and so on. Rather than your homepage
reflecting 1,000 inbound links, instead the search engine assumes there are only 250 links to
http://yoursite.com, only 250 links to
http://yoursite.com/default.aspx, and so on. In effect, redundant URLs can dilute your search engine ranking.
A key tenet of search engine optimization is URL normalization, or URL canonicalization. URL normalization is the process of eliminating duplicate URLs in your website. This article explores four different ways to implement URL normalization in your ASP.NET website. Read on to learn more!
First Things First: Deciding on a Canonical URL Format
Before we examine techniques for normalizing URLs, and certainly before such techniques can be implemented, we must first decide on a canonical URL format. Many websites use
www.websitename.comas the canonical form. For example, if you type
amazon.cominto your browser's Address bar and hit Enter you'll see that the URL is changed to
By choosing to use
www.websitename.com as the canonical form we are saying that we to "replace" the URLs:
But how do we go about "replacing" one URL with another? You can ensure that any internal links within your website refer to the canonical format, but what's to stop some other website from linking to one of the non-canonical forms? As we'll see in this article, you can't replace the non-canonical URLs; instead, you can issue permanent redirects from non-canonical URLs to the canonical form as well as include markup that gives search engines a hint as to the canonical form.
URL Normalization Using Permanent Redirects
Every ASP.NET developer is familiar with the
Response.Redirect(url)method, which redirects a visitor from the page they requested to the specified url.
Response.Redirectworks by returning a 302 HTTP status code and a
LocationHTTP header to the client. The
Locationheader indicates the URL to which the requested resource has moved. The 302 status code indicates that the resource being requested has temporarily moved to a new URL. A client - such as a search engine spider - that receives a 302 status will continue to try the original URL for future requests. There is an alternative status code, 301, that should be used if the resource has been moved permanently.
From an end user's experience, the 301 and 302 redirects behave the same - they are redirected to the specified URL. However, when a search engine spider receives a 301
status it updates its index with the new URL. Therefore, if anytime a request comes in for a non-canonical URL we immediately issue a permanent redirect to the same
page but use the canonical form then a search engine spider crawling our site will only maintain the canonical form in its index. That means that it doesn't matter
if other websites link to our homepage using a non-canonical format like
because a permanent redirect will send the user to
http://www.yoursite.com and, in the case of a search engine spider, instruct it to update its index
to use the canonical form.
There are a number of different ways we can determine if the incoming URL is in a non-canonical form and issue a permanent redirect to the canonical form. This article explores three such techniques: using ASP.NET code; using IIS 7's URL Rewrite Module; and using ISAPI_Rewrite, a commercial URL rewriting product that works with IIS 7 and earlier versions.
Issuing Permanent Redirects From ASP.NET
Every time an incoming request is handled by the ASP.NET engine, it raises the
BeginRequestevent. You can execute code in response to this event by creating an HTTP Module or by creating the
Application_BeginRequestevent handler in
Global.asax. The following code, written by Fredrik Normen and available at Redirect Permanent from a non-www to a www using ASP.NET 4.0, examines the incoming URL (
Request.Url) to see whether it starts with
www. If the URL does not start with
wwwthen a permanent redirect is issued to the same page, but with the
wwwin the URL. (The download available at the end of this article offers a VB version of the below code.)
The above code makes extensive use of the
Request.Url property, which
Uri object that has a host of properties, like
PathAndQuery, that can be used to examine the incoming URL. In the example above, the
Authority property is examined to
see if the authority (www.yoursite.com) starts with "www". If not, the user is redirected to the same URL they were requesting, but with the "www" injected.
Finally, note the use of the
Response.RedirectPermanent method. This is a new method added to ASP.NET 4 that issues a 301 permanent redirect. This method
and its behavior are described more in an earlier article of mine, Search Engine Optimization Enhancements in ASP.NET
4. If you are not using ASP.NET 4 you will have to write a few more lines of code to issue a permanent redirect - see
Chris Love's blog entry, 301
Redirect ASP.NET. (Do not use
Response.Redirect as that issues a temporary redirect and won't cause the search engines to update their indexes.)
The above code only tacks on the "www" if it is omitted - it does not drop the
default.aspx from the URL if someone were to visit
http://www.yoursite.com/default.aspx. This functionality can be added with a bit more code:
The above code (in VB) is available for download at the end of this article.
Rewriting URLs Into Canonical Form Using IIS 7's URL Rewrite Module
Shortly after releasing IIS 7, Microsoft created and released a free URL Rewrite Module. The URL Rewrite Module makes it easy to define URL rewriting rules in your
Web.configfile. To learn more about the URL Rewrite Module, including instructions on downloading and installing the module, please refer to the URL Rewrite Module documentation on www.IIS.net.
Presuming you have the URL Rewrite Module installed on your IIS 7 web server and that your website uses the integrated
pipeline, all you need to do is add the following markup to your ASP.NET application's
The above configuration defines a single rule named Canonical Host Name. This rule examines the host name and if it matches the regular expression pattern
^yoursite\.com$ - which means, in English that the host name is literally "yoursite.com" - then the user is permanently redirected to
default.aspx from the URL - that is, to normalize from
http://www.yoursite.com, add the following rule beneath the Canonical Host Name rule:
Rewriting URLs Into Canonical Form Using ISAPI_Rewrite
Microsoft's URL Rewriter Module is a great choice if you are using IIS 7, but if you are using previous version of IIS you're out of luck. What also makes things a bit more complicated is that IIS 6 and earlier has a more distinct boundary between IIS's pipeline and ASP.NET's. In IIS 6 (and earlier) requests for static resources like HTML pages, MP3s, PDFs, ZIPs, and such are not (by default) handled by ASP.NET. Consequently, any URL rewriting logic implemented at the ASP.NET layer - such as in using the
Application_BeginRequestevent handler in
Global.asax- will not ensure that the canonical URL form is used for static resources. Instead, an IIS-level solution needs to be applied.
ISAPI_Rewrite is a commercial URL rewriting engine for IIS that is quite similar to Apache's
mod_rewrite URL rewriting engine. Instead of defining rewrite rules in
using XML syntax, ISAPI_Rewrite rules are defined in a text file using single-line commands.
To get started with with ISAPI_Rewrite, head over to the download page and download and install the appropriate package. There's both a freeware "Lite" version and a fully functional commercial version. The official documentation gives a detailed overview of ISAPI_Rewrite's syntax, but here's an example of how you would use it to redirect users permanently to the canonical form:
Telling Search Engine Spiders Your Canonical Form In Markup
As we just saw, one way to enforce URL normalization is to implement permanent redirects from non-canonical URLs to canonical ones. This works well for the homepage and for adding (or dropping) the "www" from all requests to your site, but it doesn't address other issues. Consider a URL that may include querystring parameters that don't affect the content rendered on the page or only affect non-essential parts of the page. Take YouTube as an example. The canonical URL to a YouTube video is
vquerystring parameter is the key parameter here, as it specifies the video to display.
The YouTube video URL may optionally include additional URL parameters, such as a querystring parameter that specifies that the other videos in the same channel should
be displayed to the right of the video player (which takes the form
The YouTube webmasters want search engines to index the canonical form of the URL, but they can't do a permanent redirect from
http://www.youtube.com/watch?v=videoId, otherwise the channel feature will never be enabled for any visitor.
Fortunately, there is a way to give a hint to a search engine that you have a canonical URL that should be used. To specify the canonical URL simply add a
<link> element in the
<head> portion of the web page. The way this works as is follows - add the following markup to the
<head> sections of those pages in your website that you want the search engines to consider all the same URL:
In the case of YouTube, all video pages specify a
<link> element like so, regardless of whether the querystring includes just the videoId or
the videoId and other parameters:
Because this same
<link> element shows up when visiting
http://www.youtube.com/watch?v=videoId&feature=channel, the search engine spider will treat these two URLs as one in the same.
For more information on how this
<link> element works be sure to read Specify Your Canonical.