When you think ASP, think...
Recent Articles
All Articles
ASP.NET Articles
ASPFAQs.com
Message Board
Related Web Technologies
User Tips!
Coding Tips
Search

Sections:
Book Reviews
Sample Chapters
Commonly Asked Message Board Questions
JavaScript Tutorials
MSDN Communities Hub
Official Docs
Security
Stump the SQL Guru!
Web Hosts
XML
Information:
Advertise
Feedback
Author an Article

ASP ASP.NET ASP FAQs Message Board Feedback
 
Print this Page!
Published: Wednesday, December 22, 2004

A Deeper Look at Performing HTTP Requests in an ASP.NET Page

By Scott Mitchell


Introduction


Performing HTTP requests from a Web page - a task commonly referred to as "screen scraping" - involves server-side code issuing an HTTP request to some other Web site, retrieving the returned results, and processing these results in some manner. For example, screen scraping is oftentimes used to grab data from another site, such as scraping the HTML from a Yahoo! Finance page to grab the current stock price for a particular stock symbol. Another example of making an external HTTP request from an ASP.NET Web page is when grabbing syndicated content from another site to display on your own, such as the latest blog entries from your favorite blogger, or the latest articles here on 4GuysFromRolla.com. (For more information on accessing and parsing RSS syndication feeds be sure to read A Custom ASP.NET Server Control for Displaying RSS Feeds.)

Performing simple HTTP requests in ASP.NET requires just a few lines of code, thanks to the WebClient class. This class, found in the System.Net namespace, provides a small number of properties and methods useful for making simple HTTP requests. A previous 4Guys article of mine, Screen Scrapes in ASP.NET, illustrates how to use the WebClient class from an ASP.NET page. The WebClient class is appropriate for simple HTTP requests, but it lacks a number of features. For example, if your Web server is behind a proxy, the WebClient class won't work. Rather, you'll need to use the more feature-rich HttpWebRequest class, which has capabilities for working with proxies, specifying timeouts, utilizing the If-Modified-Since HTTP header, and other more advanced functionality. In this article we'll look at the HttpWebRequest class and some of its more advanced functionality.

- continued -

HttpWebRequest Basics


Before we look at some of the more advanced features of the HttpWebRequest class, let's first look at a simple example that scrapes HTML from a remote URL. To start, you need to create a HttpWebRequest instance using the WebRequest.Create(URL) method. Once you have created an instance like this, you can retrieve the data from the specified URL by calling the GetResponse() method, which returns an HttpWebResponse object. The HttpWebResponse class provides a GetResponseStream() method that returns a Stream to the underlying retrieved data.

When making a request it is important to place the code in a Try...Catch block, as a number of exceptions can occur. For example, if the remote Web server to which you are making the request is down, an exception will be thrown. The general code pattern for making and retrieving data via an HTTP request using the HttpWebRequest class is shown below:

'Create the HttpWebRequest object
Dim req as HttpWebRequest = WebRequest.Create(URL)

Try
  'Get the data as an HttpWebResponse object
  Dim resp as HttpWebResponse = req.GetResponse()

  'Convert the data into a string (assumes that you are requesting text)
  Dim sr as New StreamReader(resp.GetResponseStream())
  Dim results as String = sr.ReadToEnd()
  sr.Close()
  
  ... work with results ...
Catch wex as WebException
  'Something went awry in the HTTP request!
End Try
[View a Live Demo!]

(To use the code above in your ASP.NET pages, you'll need to import the System.Net and System.IO namespaces.)

Making an HTTP request with the HttpWebRequest object is slightly bit more involved than using the simpler WebClient class, but not terribly so. Essentially you create an HttpWebRequest object via the WebRequest.Create(URL) method and then make and get the remote data via the GetResponse() method. The resulting HttpWebResponse object provides access to the retrieved data as a Stream, so if you are expecting back string data, use the StreamReader class to read the contents of the returned Stream into a string, as shown in the code above.

If a WebException is thrown, the WebException class received in the Catch block has a couple of properties to help determine the cause of the exception. One property is the Status property, which is an enumeration of type WebExceptionStatus that spells out possible causes for the caught exception. For example, if you mistype the domain name of the server from which to get the data the resulting Status value will be NameResolutionFailure; if a connection cannot be made to the specified URL, a ConnectFailure value will be present in the Status property.

In addition to the Status property the WebException class contains a Response property. This property is of type HttpWebResponse and contains the Response data from the failed request. In certain situations, this property can provide information about the cause of the exception. For example, the HttpWebResponse class has a StatusCode property that provides information about the status of the HTTP response. In requesting a URL using the If-Modified-Since HTTP headers, if the requested resource has not been modified since the date specified, a WebException will be thrown and the resulting the HttpWebResponse class will have a StatusCode value of NotModified. (We'll talk about the If-Modified-Since HTTP header in more detail later on in this article.)

Now that we've looked at how to make a simple HTTP request using the HttpWebRequest class, let's look at how to take advantage of some of its more advanced features.

Specifying a Timeout


When making an HTTP request from an ASP.NET Web page, your page's response time becomes dependent on the response time of the remote server to which you are making the request. That is, if the remote request takes three seconds to complete, your page's response time can be no better than three seconds. Furthermore, since you cannot be certain that the remote server is even up and running, it is prudent when making an HTTP request to give up if no response has returned after a specified number of seconds. This can be accomplished by setting the Timeout property of the HttpWebRequest class, which specifies the number of milliseconds the request will wait for a response before bailing out. If this timeout duration is surpassed, a WebException exception is thrown, and the resulting WebException's Status property is set to Timeout.

For example, the following code waits at most one second for a request to a URL. If this timeout duration is exceeded, a message is displayed to the user that the request has taken too long.

'Create the HttpWebRequest object
Dim req as HttpWebRequest = WebRequest.Create(URL)

'Set the timeout to 1 second (or 1,000 milliseconds)
req.Timeout = 1000

Try
  'Get the data as an HttpWebResponse object
  Dim resp as HttpWebResponse = req.GetResponse()

  ... work with results ...
Catch wex as WebException
  'Something went awry in the HTTP request!  See if it was a timeout problem
  If wex.Status = WebExceptionStatus.Timeout Then
    Response.Write("The request has timed out!")
  Else
    Response.Write("There was some exception: " & wex.Message)
  End If
End Try

Making a Request When Behind a Proxy


In certain scenarios you may find your Web server sitting behind a proxy server. A proxy server is a specialized server that all Web requests are routed through, and are often used to either filter requests and provide caching of commonly requested content. If your Web server is sitting behind a proxy server you'll need to explicitly specify the proxy server information through the HttpWebRequest class's Proxy property.

The Proxy property must be set to an object that implements IWebProxy, which includes the WebProxy class (also in the System.Net namespace). The WebProxy class has properties to specify the address of the proxy server along with credential information, if your proxy server requires authentication. For example, if your Web server needs to go through the proxy server at http://255.255.1.1:8080, and does not require that requests include credentials, you could specify that an HTTP request utilize the proxy server with the following code:

'Create the HttpWebRequest object
Dim req as HttpWebRequest = WebRequest.Create(URL)

'Create the proxy class instance
Dim prxy as New WebProxy("http://255.255.1.1:8080")

'Specify that the HttpWebRequest should use the proxy server
req.Proxy = prxy

Try
  'Get the data as an HttpWebResponse object
  Dim resp as HttpWebResponse = req.GetResponse()

  ... work with results ...
Catch wex as WebException
  ...
End Try

Only Downloading the Complete Data When Needed


When a browser visits a Web page, oftentimes the browser will save a cached version of the page on the Web surfer's hard drive. When the Web surfer visits the same page, the browser, in sending the HTTP request, will add an HTTP header called If-Modified-Since, specifying the date and time of its cached version. The Web server that receives this HTTP request then determines if the content has changed since the specified date. If it has, the Web server sends back the complete content; if it has not changed, the Web server replies with an HTTP response with status code 304, indicating that the content has not been modified.

Both the server and Web surfer benefit from this interaction. If a document has not been modified since a specified date, the browser can quickly show the cached copy, since it needs not wait for the data to be downloaded. The Web server benefits because its response contains merely a 304 HTTP header, and does not contain the contents of the requested resource, thereby reducing its bandwidth usage. These conditional gets are especially helpful in not overburdening RSS feeds, which are oftentimes requested hourly by hundreds or thousands of RSS aggregators, but only updated a couple times a day, at most. (For more information on conditional gets, be sure to read: HTTP Conditional GETs for RSS Hackers.)

The use of conditional gets is only useful in a situation where you are caching the data locally. Assuming you are caching the data locally, in addition to caching the actual content received by the HTTP request, you'll also want to cache the date and time the content was cached. Then, to specify that a Web request using HttpWebRequest should use a conditional get, simply set the IfModifiedSince property to the date the content was cached. If the request returns a 304 status code, a WebException is thrown and the resulting exception's Response property's StatusCode is set to NotModified. The following code demonstrates using conditional gets:

'Create the HttpWebRequest object
Dim req as HttpWebRequest = WebRequest.Create(URL)

'Specify the If-Modified-Since HTTP header value (use the local date/time)
req.IfModifiedSince = dateDataWasCached

Try
  'Get the data as an HttpWebResponse object
  Dim resp as HttpWebResponse = req.GetResponse()

  ... work with results ...
  
  Since the data has been modified, be sure to cache the new
  data and update the cached date/time value!
Catch wex as WebException
  Dim respErr as HttpWebResponse = wex.Response
  If Not respErr Is Nothing _
         AndAlso respErr.StatusCode = HttpStatusCode.NotModified Then
    'The data has not been modified
  End If
End Try

Making Authenticated HTTP Requests
If you need to make a request to a protected HTTP resource, one that requires authentication, you can use the HttpWebRequest class's Credentials property. To learn more about this be sure to read Making Authenticated HTTP Requests from an ASP.NET Page.

Conclusion


In this article we saw how to use the HttpWebRequest class to make an HTTP request from an ASP.NET Web page. In the case where very simple screen scraping needs to be done, I'd recommend using the WebClient class, as discussed in this article. However, if you need fine control over the specific request - such as specifying a proxy server to use, indicating a timeout value, or utilizing the If-Modified-Since HTTP headers - you'll need to use the HttpWebRequest class instead.

Happy Programming!

  • By Scott Mitchell



  • ASP.NET [1.x] [2.0] | ASPMessageboard.com | ASPFAQs.com | Advertise | Feedback | Author an Article