A Deeper Look at Performing HTTP Requests in an ASP.NET Page
By Scott Mitchell
Introduction
Performing HTTP requests from a Web page - a task commonly referred to as "screen scraping" - involves server-side code issuing
an HTTP request to some other Web site, retrieving the returned results, and processing these results in some manner. For example,
screen scraping is oftentimes used to grab data from another site, such as scraping the HTML from a Yahoo! Finance page to
grab the current stock price for a particular stock symbol. Another example of making an external HTTP request from an
ASP.NET Web page is when grabbing syndicated content from another site to display on your own, such as the latest blog entries
from your favorite blogger, or the latest articles here on 4GuysFromRolla.com. (For more information
on accessing and parsing RSS syndication feeds be sure to read A Custom ASP.NET Server Control
for Displaying RSS Feeds.)
Performing simple HTTP requests in ASP.NET requires just a few lines of code, thanks to the WebClient class.
This class, found in the System.Net namespace, provides a small number of properties and methods useful for
making simple HTTP requests. A previous 4Guys article of mine, Screen
Scrapes in ASP.NET, illustrates how to use the WebClient class from an ASP.NET page. The WebClient
class is appropriate for simple HTTP requests, but it lacks a number of features. For example, if your Web server is behind
a proxy, the WebClient class won't work. Rather, you'll need to use the more feature-rich HttpWebRequest
class, which has capabilities for working with proxies, specifying timeouts, utilizing the If-Modified-Since HTTP header,
and other more advanced functionality. In this article we'll look at the HttpWebRequest class and some of
its more advanced functionality.
HttpWebRequest Basics
Before we look at some of the more advanced features of the HttpWebRequest class, let's first look at a simple
example that scrapes HTML from a remote URL. To start, you need to create a HttpWebRequest instance using
the WebRequest.Create(URL) method. Once you have created an instance like this, you can retrieve
the data from the specified URL by calling the GetResponse() method, which returns an
HttpWebResponse object. The HttpWebResponse class provides a GetResponseStream() method
that returns a Stream to the underlying retrieved data.
When making a request it is important to place the code in a Try...Catch block, as a number of exceptions can occur. For
example, if the remote Web server to which you are making the request is down, an exception will be thrown. The general code
pattern for making and retrieving data via an HTTP request using the HttpWebRequest class is shown below:
'Create the HttpWebRequest object
Dim req as HttpWebRequest = WebRequest.Create(URL)
Try
'Get the data as an HttpWebResponse object
Dim resp as HttpWebResponse = req.GetResponse()
'Convert the data into a string (assumes that you are requesting text)
Dim sr as New StreamReader(resp.GetResponseStream())
Dim results as String = sr.ReadToEnd()
sr.Close()
... work with results ...
Catch wex as WebException
'Something went awry in the HTTP request!
End Try
(To use the code above in your ASP.NET pages, you'll need to import the System.Net and System.IO
namespaces.)
Making an HTTP request with the HttpWebRequest object is slightly bit more involved than using the simpler
WebClient class, but not terribly so. Essentially you create an HttpWebRequest object via the
WebRequest.Create(URL) method and then make and get the remote data via the GetResponse()
method. The resulting HttpWebResponse object provides access to the retrieved data as a Stream, so if you
are expecting back string data, use the StreamReader class to read the contents of the returned Stream into a string, as
shown in the code above.
If a WebException is thrown, the WebException class received in the Catch block
has a couple of properties to help determine the cause of the exception. One property is the Status property, which
is an enumeration of type WebExceptionStatus that spells out possible causes for the caught exception. For example, if you mistype the domain name of the
server from which to get the data the resulting Status value will be NameResolutionFailure; if a
connection cannot be made to the specified URL, a ConnectFailure value will be present in the Status
property.
In addition to the Status property the WebException class contains a Response property.
This property is of type HttpWebResponse and contains the Response data from the failed request.
In certain situations, this property can provide information about the cause of the exception.
For example, the HttpWebResponse class has a StatusCode property that provides
information about the status of the HTTP response. In requesting a URL using the If-Modified-Since HTTP headers,
if the requested resource has not been modified since the date specified, a WebException will be thrown
and the resulting the HttpWebResponse class will have a StatusCode value of NotModified.
(We'll talk about the If-Modified-Since HTTP header in more detail later on in this article.)
Now that we've looked at how to make a simple HTTP request using the HttpWebRequest class, let's look at how
to take advantage of some of its more advanced features.
Specifying a Timeout
When making an HTTP request from an ASP.NET Web page, your page's response time becomes dependent on the response time of the
remote server to which you are making the request. That is, if the remote request takes three seconds to complete, your page's
response time can be no better than three seconds. Furthermore, since you cannot be certain that the remote server is even
up and running, it is prudent when making an HTTP request to give up if no response has returned after a specified number
of seconds. This can be accomplished by setting the Timeout property of the HttpWebRequest class,
which specifies the number of milliseconds the request will wait for a response before bailing out. If this timeout duration
is surpassed, a WebException exception is thrown, and the resulting WebException's Status
property is set to Timeout.
For example, the following code waits at most one second for a request to a URL. If this timeout duration is exceeded, a
message is displayed to the user that the request has taken too long.
'Create the HttpWebRequest object
Dim req as HttpWebRequest = WebRequest.Create(URL)
'Set the timeout to 1 second (or 1,000 milliseconds)
req.Timeout = 1000
Try
'Get the data as an HttpWebResponse object
Dim resp as HttpWebResponse = req.GetResponse()
... work with results ...
Catch wex as WebException
'Something went awry in the HTTP request! See if it was a timeout problem
If wex.Status = WebExceptionStatus.Timeout Then
Response.Write("The request has timed out!")
Else
Response.Write("There was some exception: " & wex.Message)
End If
End Try
Making a Request When Behind a Proxy
In certain scenarios you may find your Web server sitting behind a proxy
server. A proxy server is a specialized server that all Web requests are routed through, and are often used to either filter
requests and provide caching of commonly requested content. If your Web server is sitting behind a proxy server you'll need to explicitly
specify the proxy server information through the HttpWebRequest class's Proxy property.
The Proxy property must be set to an object that implements IWebProxy, which includes the WebProxy
class (also in the System.Net namespace). The WebProxy class has properties to specify the
address of the proxy server along with credential information, if your proxy server requires authentication. For example,
if your Web server needs to go through the proxy server at http://255.255.1.1:8080, and does not require that requests include
credentials, you could specify that an HTTP request utilize the proxy server with the following code:
'Create the HttpWebRequest object
Dim req as HttpWebRequest = WebRequest.Create(URL)
'Create the proxy class instance
Dim prxy as New WebProxy("http://255.255.1.1:8080")
'Specify that the HttpWebRequest should use the proxy server
req.Proxy = prxy
Try
'Get the data as an HttpWebResponse object
Dim resp as HttpWebResponse = req.GetResponse()
... work with results ...
Catch wex as WebException
...
End Try
Only Downloading the Complete Data When Needed
When a browser visits a Web page, oftentimes the browser will save a cached version of the page on the Web surfer's hard drive.
When the Web surfer visits the same page, the browser, in sending the HTTP request, will add an HTTP header called If-Modified-Since,
specifying the date and time of its cached version. The Web server that receives this HTTP request then determines if
the content has changed since the specified date. If it has, the Web server sends back the complete content; if it has not
changed, the Web server replies with an HTTP response with status code 304, indicating that the content has not been modified.
Both the server and Web surfer benefit from this interaction. If a document has not been modified since a specified date,
the browser can quickly show the cached copy, since it needs not wait for the data to be downloaded. The Web server benefits
because its response contains merely a 304 HTTP header, and does not contain the contents of the requested resource, thereby
reducing its bandwidth usage. These conditional gets are especially helpful in not overburdening RSS feeds, which are
oftentimes requested hourly by hundreds or thousands of RSS aggregators, but only updated a couple times a day, at most.
(For more information on conditional gets, be sure to read: HTTP Conditional
GETs for RSS Hackers.)
The use of conditional gets is only useful in a situation where you are caching the data locally. Assuming you are caching
the data locally, in addition to caching the actual content received by the HTTP request, you'll also want to cache the date
and time the content was cached. Then, to specify that a Web request using HttpWebRequest should use a conditional get,
simply set the IfModifiedSince property to the date the content was cached. If the request returns a 304 status code,
a WebException is thrown and the resulting exception's Response property's StatusCode
is set to NotModified. The following code demonstrates using conditional gets:
'Create the HttpWebRequest object
Dim req as HttpWebRequest = WebRequest.Create(URL)
'Specify the If-Modified-Since HTTP header value (use the local date/time)
req.IfModifiedSince = dateDataWasCached
Try
'Get the data as an HttpWebResponse object
Dim resp as HttpWebResponse = req.GetResponse()
... work with results ...
Since the data has been modified, be sure to cache the new
data and update the cached date/time value!
Catch wex as WebException
Dim respErr as HttpWebResponse = wex.Response
If Not respErr Is Nothing _
AndAlso respErr.StatusCode = HttpStatusCode.NotModified Then
'The data has not been modified
End If
End Try
Making Authenticated HTTP Requests
If you need to make a request to a protected HTTP resource, one that requires authentication, you can use the
HttpWebRequest class's Credentials property. To learn more about this be sure to read
Making Authenticated HTTP Requests from an ASP.NET Page.
Conclusion
In this article we saw how to use the HttpWebRequest class to make an HTTP request from an ASP.NET Web page.
In the case where very simple screen scraping needs to be done, I'd recommend using the WebClient class, as
discussed in this article. However, if you need fine control
over the specific request - such as specifying a proxy server to use, indicating a timeout value, or utilizing the If-Modified-Since
HTTP headers - you'll need to use the HttpWebRequest class instead.