Making Authenticated HTTP Requests from an ASP.NET Page
By Scott Mitchell
Introduction
Performing HTTP requests from a web page - a task commonly referred to as "screen scraping" - involves server-side code issuing an HTTP request to some other Web site, retrieving the returned results, and processing these results in some manner. For example, screen scraping is oftentimes used to grab data from another site, such as scraping the HTML from a Yahoo! Finance page to grab the current stock price for a particular stock symbol. Performing simple HTTP requests in ASP.NET requires just a few lines of code, thanks to the
WebClient
class.
This class, found in the System.Net
namespace, provides a small number of properties and methods useful for
making simple HTTP requests. A previous 4Guys article of mine, Screen
Scrapes in ASP.NET, illustrates how to use the WebClient
class from an ASP.NET page.
RssFeed, a custom, compiled ASP.NET server control I created for displaying RSS feeds in an ASP.NET page, uses programmatic HTTP requests to grab the syndicated content from a specified URL. Recently a user of RssFeed asked me if RssFeed provided support for RSS feeds that required authentication. That is, this user wanted to display the contents of an RSS feed that was only accessible by providing authentication information, such as a username and password. While RssFeed itself didn't provide this functionality, the underlying classes used by RssFeed to access the remote RSS feed do, so I added this functionality. (For more information on displaying RSS content in your ASP.NET website, be sure to read A Custom ASP.NET Server Control for Displaying RSS Feeds.)
In this article we will first discuss the common authentication protocols used by web servers and then look at how to make programmatic HTTP requests to a resource that requires authentication. Read on to learn more!
A Look at Authentication Protocols
There are a number of standard techniques through which a web server can identify the user of an incoming request. The three most commonly used authentication protocols are:
- Basic authentication - when an unauthenticated request comes into the web server, the web server returns an HTTP 401 response, prompting the client for its credentials. The client re-requests the same resource, passing the username and password in a base-64 encoded HTTP header. (The base-64 encoding does not encrypt or protect the credentials; it merely ensures that the characters sent over the wire are in a format that won't conflict with any reserved characters.) Since the credentials are sent over the wire in plain-text, Basic authentication should only be used when using SSL, since this ensures that the entire contents of the HTTP request are encrypted.
- Digest authentication - like Basic authentication, when an unauthenticated request comes into the web server, the web server returns an HTTP 401 response, prompting the client for its credentials. In addition to this request, the web server also sends back additional pieces of information, such as a nonce (a random string) and a sequence identifier. The client then re-requests the resource including an HTTP header that has the username in plain-text and a hash of the password. The hash is salted by the nonce, the sequence identifier, and other tidbits. (Hashing is the process of taking a plain-text input and converting it into a form that cannot be converted back into the plain-text form. The web server receiving the hashed password must know the user's plain-text password; it hashes this known plain-text password and ensures that it matches up with the hashed version sent over the wire. The short of it is, a hashed password can be safely sent over an insecure channel. To learn more about the basics of hashing, see the Wikipedia hashing entry.) To learn more about Basic and Digest authentication, refer to RFC 2617.
- NTLM, or NT Challenge/Response, or Integrated Windows Authentication - NTLM avoids sending even a digest of the password. Instead, the server and client correspond in a three-step authentication procedure where the client ends up hashing a nonce with their password. The client's username and this hashed nonce is then sent back to the server and verified. For more in-depth information on NTLM refer to Microsoft's NTLM documentation.
As we just discussed, when a request comes in for a protected resource the web server sends back a message to the client - typically your browser. This causes that familiar dialog box to popup, which prompts you for your username and password. (With NTLM, if you are logged on to the domain, this information is seamlessly sent to the web server, without requiring the end user to re-enter their credentials.) When making an HTTP programmatically, there's no dialog box. Rather, we must instruct the appropriate classes to use a particular authentication scheme with particular credentials. We'll examine how to accomplish this shortly, after a quick look at the basics of making programmatic HTTP requests in .NET.
A Quick Primer on Making HTTP Requests from an ASP.NET Page
The .NET Framework provides a couple of classes for making programmatic HTTP requests, both of which can be found in the
System.Net
namespace. The first class, HttpWebRequest
, provides a rich set of features for making
an HTTP request. Using this class you can perform very simple HTTP requests, or you can configure its properties to handle
more complex scenarios. For example, the HttpWebRequest
provides properties to enable:
- Tunneling the request through a proxy,
- A timeout value - if the request does not return within a specified number of milliseconds, an exception is raised,
- Asynchronous HTTP requests - start the HTTP request on a separate thread and receive notification when the request completes,
If-Modified-Since
support, which enables the HTTP request to be smart enough to only download the complete content if it's changes since the last request was made,- And others...
HttpWebRequest
class.
If all you need to do is make a simple HTTP request without needing to tunnel through a proxy, specify timeout values, or
make asynchronous requests, the .NET Framework provides the WebClient
class, which is designed to simplify the
HTTP request process. Using the WebClient
class requires a few less lines of code than the HttpWebRequest
class and, in my opinion, the resulting code is more readable. (As you may have guessed, the WebClient
class
uses the HttpWebRequest
class internally.) For more information on making HTTP requests with the WebClient
class
be sure to read Screen Scrapes in ASP.NET.
The following snippets of code show how to use both the WebClient
and HttpWebRequest
classes to make
a simple HTTP request. Both snippets result in saving the HTML of the requested web page into the string variable results
.
|
Making Authenticated HTTP Requests
Both the
WebClient
and HttpWebRequest
classes make it easy to include authentication information
in the request through their Credentials
properties. The Credentials
property accepts an object
that implements ICredentials
. The CredentialCache
class provides a store for credentials. You can add new NetworkCredential
instances to this store, or use the CredentialCache.DefaultCredentials
property to use the credentials of the currently logged on user. (If you are making an HTTP request from an ASP.NET page,
you likely will not want to use the DefaultCredentials
property unless you are using
impersonation;
furthermore, the DefaultCredentials
property can only be used when authenticating against NTLM or Keberos-based
authentication schemes.)
The intent of the CredentialCache
class is to store a set of credentials for the user. When a request is made
to a resource, the CredentialCache
class can be interrogated and the appropriate credentials can be extracted
based on the resource being requested. That is, the CredentialCache
class can be used to hold credentials for
various websites and, when a request is made, the appropriate credentials can be grabbed from the store based on the URL
request. This functionality may be useful in a desktop-based application, where the CredentialCache
class object
persists for the duration of the program's execution, but with ASP.NET pages you'll typically want to create a new
CredentialCache
object each time an authenticated HTTP request is needed to be made.
The following code shows how to use the CredentialCache
class and the WebClient
's Credentials
property to make a request to a URL that is protected via basic authentication:
|
To authenticate using Digest, instead of using "Basic"
as the second input parameter to the Add()
method, use "Digest"
.
To authenticate against an NTLM scheme using the current user's logged on credentials, use the following code:
|
Supporting Protected RSS Feeds with RssFeed
The impetus for this article stemmed from a user requesting the ability for displaying protected RSS feeds through RssFeed. Just like regular web pages, RSS feeds can also be protected through any one of the common authentication schemes. Most desktop-based RSS aggregators provide support for protected RSS feeds by allowing the user to specify a username and password in the feed's properties dialog box. When requesting an RSS feed programmatically, however, we need to use the techniques discussed in this article.
Underneath the covers, RssFeed uses the HttpWebRequest
class to programmatically access a remote RSS feed.
The RssFeed control then provides a Credentials
property of type ICredentials
. If this property
is set to an object, the internal HttpWebRequest
class instance's Credentials
property is assigned
this value. And that's all there is to it!
Here's a snippet of code showing how to use RssFeed to display an authenticated RSS feed:
|
To download RssFeed visit the official RssFeed project page.
Happy Programming!