A Deeper Look at Performing HTTP Requests in an ASP.NET Page
By Scott Mitchell
Introduction
Performing HTTP requests from a Web page - a task commonly referred to as "screen scraping" - involves server-side code issuing an HTTP request to some other Web site, retrieving the returned results, and processing these results in some manner. For example, screen scraping is oftentimes used to grab data from another site, such as scraping the HTML from a Yahoo! Finance page to grab the current stock price for a particular stock symbol. Another example of making an external HTTP request from an ASP.NET Web page is when grabbing syndicated content from another site to display on your own, such as the latest blog entries from your favorite blogger, or the latest articles here on 4GuysFromRolla.com. (For more information on accessing and parsing RSS syndication feeds be sure to read A Custom ASP.NET Server Control for Displaying RSS Feeds.)
Performing simple HTTP requests in ASP.NET requires just a few lines of code, thanks to the WebClient
class.
This class, found in the System.Net
namespace, provides a small number of properties and methods useful for
making simple HTTP requests. A previous 4Guys article of mine, Screen
Scrapes in ASP.NET, illustrates how to use the WebClient
class from an ASP.NET page. The WebClient
class is appropriate for simple HTTP requests, but it lacks a number of features. For example, if your Web server is behind
a proxy, the WebClient
class won't work. Rather, you'll need to use the more feature-rich HttpWebRequest
class, which has capabilities for working with proxies, specifying timeouts, utilizing the If-Modified-Since
HTTP header,
and other more advanced functionality. In this article we'll look at the HttpWebRequest
class and some of
its more advanced functionality.
HttpWebRequest
Basics
Before we look at some of the more advanced features of the
HttpWebRequest
class, let's first look at a simple
example that scrapes HTML from a remote URL. To start, you need to create a HttpWebRequest
instance using
the WebRequest.Create(URL)
method. Once you have created an instance like this, you can retrieve
the data from the specified URL by calling the GetResponse()
method, which returns an
HttpWebResponse
object. The HttpWebResponse
class provides a GetResponseStream()
method
that returns a Stream to the underlying retrieved data.
When making a request it is important to place the code in a Try...Catch block, as a number of exceptions can occur. For
example, if the remote Web server to which you are making the request is down, an exception will be thrown. The general code
pattern for making and retrieving data via an HTTP request using the HttpWebRequest
class is shown below:
|
(To use the code above in your ASP.NET pages, you'll need to import the System.Net
and System.IO
namespaces.)
Making an HTTP request with the HttpWebRequest
object is slightly bit more involved than using the simpler
WebClient
class, but not terribly so. Essentially you create an HttpWebRequest
object via the
WebRequest.Create(URL)
method and then make and get the remote data via the GetResponse()
method. The resulting HttpWebResponse
object provides access to the retrieved data as a Stream, so if you
are expecting back string data, use the StreamReader class to read the contents of the returned Stream into a string, as
shown in the code above.
If a WebException
is thrown, the WebException
class received in the Catch
block
has a couple of properties to help determine the cause of the exception. One property is the Status
property, which
is an enumeration of type WebExceptionStatus
that spells out possible causes for the caught exception. For example, if you mistype the domain name of the
server from which to get the data the resulting Status
value will be NameResolutionFailure
; if a
connection cannot be made to the specified URL, a ConnectFailure
value will be present in the Status
property.
In addition to the Status
property the WebException
class contains a Response
property.
This property is of type HttpWebResponse
and contains the Response data from the failed request.
In certain situations, this property can provide information about the cause of the exception.
For example, the HttpWebResponse
class has a StatusCode
property that provides
information about the status of the HTTP response. In requesting a URL using the If-Modified-Since
HTTP headers,
if the requested resource has not been modified since the date specified, a WebException
will be thrown
and the resulting the HttpWebResponse
class will have a StatusCode
value of NotModified
.
(We'll talk about the If-Modified-Since
HTTP header in more detail later on in this article.)
Now that we've looked at how to make a simple HTTP request using the HttpWebRequest
class, let's look at how
to take advantage of some of its more advanced features.
Specifying a Timeout
When making an HTTP request from an ASP.NET Web page, your page's response time becomes dependent on the response time of the remote server to which you are making the request. That is, if the remote request takes three seconds to complete, your page's response time can be no better than three seconds. Furthermore, since you cannot be certain that the remote server is even up and running, it is prudent when making an HTTP request to give up if no response has returned after a specified number of seconds. This can be accomplished by setting the
Timeout
property of the HttpWebRequest
class,
which specifies the number of milliseconds the request will wait for a response before bailing out. If this timeout duration
is surpassed, a WebException
exception is thrown, and the resulting WebException
's Status
property is set to Timeout
.
For example, the following code waits at most one second for a request to a URL. If this timeout duration is exceeded, a message is displayed to the user that the request has taken too long.
|
Making a Request When Behind a Proxy
In certain scenarios you may find your Web server sitting behind a proxy server. A proxy server is a specialized server that all Web requests are routed through, and are often used to either filter requests and provide caching of commonly requested content. If your Web server is sitting behind a proxy server you'll need to explicitly specify the proxy server information through the
HttpWebRequest
class's Proxy
property.
The Proxy
property must be set to an object that implements IWebProxy
, which includes the WebProxy
class (also in the System.Net
namespace). The WebProxy
class has properties to specify the
address of the proxy server along with credential information, if your proxy server requires authentication. For example,
if your Web server needs to go through the proxy server at http://255.255.1.1:8080, and does not require that requests include
credentials, you could specify that an HTTP request utilize the proxy server with the following code:
|
Only Downloading the Complete Data When Needed
When a browser visits a Web page, oftentimes the browser will save a cached version of the page on the Web surfer's hard drive. When the Web surfer visits the same page, the browser, in sending the HTTP request, will add an HTTP header called
If-Modified-Since
,
specifying the date and time of its cached version. The Web server that receives this HTTP request then determines if
the content has changed since the specified date. If it has, the Web server sends back the complete content; if it has not
changed, the Web server replies with an HTTP response with status code 304, indicating that the content has not been modified.
Both the server and Web surfer benefit from this interaction. If a document has not been modified since a specified date, the browser can quickly show the cached copy, since it needs not wait for the data to be downloaded. The Web server benefits because its response contains merely a 304 HTTP header, and does not contain the contents of the requested resource, thereby reducing its bandwidth usage. These conditional gets are especially helpful in not overburdening RSS feeds, which are oftentimes requested hourly by hundreds or thousands of RSS aggregators, but only updated a couple times a day, at most. (For more information on conditional gets, be sure to read: HTTP Conditional GETs for RSS Hackers.)
The use of conditional gets is only useful in a situation where you are caching the data locally. Assuming you are caching
the data locally, in addition to caching the actual content received by the HTTP request, you'll also want to cache the date
and time the content was cached. Then, to specify that a Web request using HttpWebRequest
should use a conditional get,
simply set the IfModifiedSince
property to the date the content was cached. If the request returns a 304 status code,
a WebException
is thrown and the resulting exception's Response
property's StatusCode
is set to NotModified
. The following code demonstrates using conditional gets:
|
Making Authenticated HTTP Requests |
---|
If you need to make a request to a protected HTTP resource, one that requires authentication, you can use the
HttpWebRequest class's Credentials property. To learn more about this be sure to read
Making Authenticated HTTP Requests from an ASP.NET Page.
|
Conclusion
In this article we saw how to use the
HttpWebRequest
class to make an HTTP request from an ASP.NET Web page.
In the case where very simple screen scraping needs to be done, I'd recommend using the WebClient
class, as
discussed in this article. However, if you need fine control
over the specific request - such as specifying a proxy server to use, indicating a timeout value, or utilizing the If-Modified-Since
HTTP headers - you'll need to use the HttpWebRequest
class instead.
Happy Programming!