When you think ASP, think...
Recent Articles
All Articles
ASP.NET Articles
ASPFAQs.com
Message Board
Related Web Technologies
User Tips!
Coding Tips
Search

Sections:
Book Reviews
Sample Chapters
Commonly Asked Message Board Questions
JavaScript Tutorials
MSDN Communities Hub
Official Docs
Security
Stump the SQL Guru!
Web Hosts
XML
Information:
Advertise
Feedback
Author an Article

ASP ASP.NET ASP FAQs Message Board Feedback
 
Print this Page!
Published: Wednesday, September 9, 2009

Querying a Multi-Tenant Data Architecture

By Scott Mitchell


Introduction


During my career as a consultant and web developer I've worked on a number of "software as a service" (SaaS) web applications. Such web applications are designed to be used by multiple customers, where each customer is typically a company that has dozens or hundreds of users (its employees or customers) that access the system. One such SaaS-style application that I've worked on for the past several years is a hosted application used by hospitals and clinics to manage patient information, doctor's appointments, billing, prescriptions, and so forth.

When developing a SaaS-style application you need to decide how to store each customer's data. The article Multi-Tenant Data Architecture examines different data architectures for SaaS applications. One option presented in the article is to store all customer data in a single database. This approach requires adding a CustomerID column to all of your tables to determine which data belongs to what customer. Moreover, whenever you query the database to display data in a web page you need to make sure to filter the data by the CustomerID of the currently logged on user. Another option is to use a separate database for each customer. This approach lessens the risk of a user somehow seeing or modifying data for a different customer and this level of isolation may be necessary for applications that store sensitive data, such as health care and financial applications. The hospital software SaaS application I work on uses such a separate database data architecture.

One downside to using separate databases is that it is harder to run a query against all customer data. For example, if you find an anomaly in the data for one customer - say, a patient record that has a street address specified but no city specified - it can be difficult to quickly determine whether the same anomaly exists in other databases. This article discusses different ways to query multiple databases and shows a simple, web-based tool I've created and routinely use for those SaaS-style web applications I work on that use separate databases to store different customer data. Read on to learn more!

- continued -

The Motivation Behind Using Separate Databases


Before we look at ways to query multiple databases, it behooves us to take a moment and discuss why one might choose to have a SaaS-style application use a data architecture where each customer's data is stored in a separate database. Such a data architecture certainly has its disadvantages:
  • Rolling out changes to the data model is more difficult because changes have to be applied to multiple databases. If you've added new functionality that requires changes to the data model - a new table, a new column in an existing table, new stored procedures, etc. - then when rolling out these changes you need to be certain that these modifications are applied to all databases. This is not terribly complex if you have a disciplined, systematic, tested, and automated approach to rolling out changes, but can lead to bugs and incomplete data models for certain customers if you are doing this manually.
  • Adding a new customer requires creating a new database. Anytime you add a new customer you need to create a new database that has the baseline database objects - the requisite tables, stored procedures, views, and so on. Likewise, any baseline data, such as the data in lookup tables, must be populated into this new database.
  • Querying across multiple customer data can be challenging.
These challenges are unique to using a separate databases data architecture. With a shared database, these challenges are either simplified or go away altogether.

The main benefit of using a separate database for each customer is data isolation. Having each customer's data in a separate database virtually eliminates the possibility that one customer may be able to inadvertently see or modify the data of another customer due to poor programming practices or a bug in the application. This level of isolation is attractive when storing sensitive data, such as medical records or financial information. For a much more in-depth look at the different data architecture options for SaaS applications, as well as tips to determine the best architecture for your scenario, refer to Ruminations on Multi-Tenant Data Architectures.

As aforementioned, one of the challenges of using a separate database for each customer is querying data across customers. The remainder of this article examines two ways to facilitate such queries. First, we'll look at using sp_MsForEachDb, an undocumented stored procedure in Microsoft SQL Server. Following that, we'll see how to build a web page that serves as a more attractive and functional user interface to the sp_MsForEachDb stored procedure.

Querying All Databases With sp_MsForEachDb


Microsoft SQL Server includes a number of undocumented stored procedures. One of these is sp_MsForEachDb, which runs a specified query against all databases on the database server. The syntax follows:

EXEC sp_MsForEachDb 'command to execute on all databases'

You can put a question mark (?) within the command to execute on all databases string to inject the name of the database the command is currently being executed on. For example, to see how many active patients exist in each database you could run the following query:

EXEC sp_MsForEachDb 'SELECT ''?'' AS DatabaseName, COUNT(*) AS ActivePatientCount FROM ?.dbo.Patients WHERE Active = 1'

Note that the question mark is used twice in the above command: to return the database name in the select list and to indicate the Patients table to query. If you omit the question mark from the table name (i.e., if you use FROM Patients rather than FROM ?.dbo.Patients) then you will get back a resultset The above statement might return results that look like the following:

DatabaseNameActivePatientCount
ST_ELSEWHERE1424
SACRED_HEART157
GENERAL_HOSPITAL941
COUNTY_GENERAL574

Keep in mind that sp_MsForEachDb returns a resultset for each database. In other words, the results above would not be returned as rows in a single resultset, but rather each would return a single row in a separate resultset. To put it another way, if you were to run the above query in SQL Server Management Studio you would see four separate "grids" in the output rather than a single grid with four rows.

While sp_MsForEachDb offers a quick and dirty way to run the same query across all databases, it has two major drawbacks:

  • To run sp_MsForEachDb you must connect to the database and type in your query, which usually means that you must have an account on the database server and be able to connect to the database using a program like SQL Server Management Studio. This may not be possible from a remote location, such as when visiting a client's site.
  • sp_MsForEachDb executes the specified command against all databases, including built-in databases like master and model. There's no way to query just a subset of databases.
To overcome these shortcomings I have a web page that I created that serves as a web-based front end for sp_MsForEachDb. In a nutshell, this page allows me to type in a query, select the databases to run it against, and see the results in a grid. The screen shot below shows this page in action.

From this web page a query can be dispatched to multiple databases.

The remainder of this article looks at how you can build such a page yourself.

Determining The Customer Databases


Unlike sp_MsForEachDb, the web page I've created allows the user to choose the specific set of databases to issue the query against. As the screen shot above shows, the customer databases are listed using a CheckBoxList and the query is dispatched to the checked databases. But how do I know what databases to list in this CheckBoxList? One approach would be to hard code the databases into the CheckBoxList, but that would require revisiting this page each time a new customer is added or whenever a customer cancels their service. Instead, we can write a query that will return the names of all databases that have a particular table:

EXEC sp_MSForEachDb 'IF EXISTS(SELECT 1 FROM ?.dbo.sysobjects WHERE NAME=''CommonTableName'' AND Type=''U'') SELECT ''?'' AS DBName'

This query uses sp_MSForEachDb to query each databases sysobjects table. The sysobjects table contains a record for every object in the database, including tables. The above query says, "If the current database has a table named CommonTableName then return the name of the database." Here, CommonTableName would be the name of one of the tables in the data model that is shared among all customer databases but is not found in those databases that you want to omit (such as master).

The following method shows how to work with the results of sp_MSForEachDb in your .NET application. Typically, when you issue a command to the database you can get back a DataReader and walk through each row in the resultset using the DataReader's Read method. However, because the above query uses sp_MSForEachDb it does not return a single resultset with multiple rows, but rather a series of resultsets, each with one row that has one column, the database name. You can still get the data back via a DataReader object, but to step through each resultset you need to use the DataReader's NextResult method, as the following code illustrates:

Public Shared Function GetCustomerDatabases() As List(Of String)
   Using myConnection As New SqlConnection(connectionString)
      myConnection.Open()
      
      Dim myCommand As New SqlCommand("EXEC sp_MSForEachDb 'IF EXISTS...", myConnection)
      
      Dim myReader As SqlDataReader = myCommand.ExecuteReader()
   
      Dim dbList As New List(Of String)
      Do
         If myReader.Read() Then
            dbList.Add(myReader("DBName").ToString())
         End If
      Loop While myReader.NextResult()

      myReader.Close()

      Return dbList
   End Using
End Function

The Do...Loop in the code above starts with the current resultset and reads in the first row via myReader.Read(). If a row was read in it adds this row to the dbList list. At the end of the loop the next resultset is loaded using myReader.NextResult(). If there is no more resultsets then NextResult returns False and the loop terminates.

The CheckBoxList in the ASP.NET page is populated by assigning the list of customer databases to the CheckBoxList's DataSource property and then calling the DataBind method:

cblCustomerDatabases.DataSource = GetCustomerDatabases()
cblCustomerDatabases.DataBind()

Issuing the Query to the Selected Customer Databases and Displaying the Results


Once the user has selected which databases to query, typed in their query, and clicked the "Execute Query" button, we need to dispatch the query to each of the selected databases and display the results. The following code shows the "Execute Query" Button's Click event handler. Note that each item in the CheckBoxList is enumerated and for each selected item the query is issued to the database and the results are returned in the form of a DataTable.

Protected Sub btnExecuteQuery_Click(ByVal sender As Object, ByVal e As EventArgs) Handles btnExecuteQuery.Click
   Dim query As String = txtQuery.Text.Trim()

   For Each cb As ListItem In cblCustomerDatabases.Items
      If cb.Selected Then
         Dim databaseName As String = cb.Value
         Dim results As New DataTable
         
         Using myConnection As New SqlConnection(ConstructConnectionString(databaseName))
            myConnection.Open()

            Dim myCommand As New SqlCommand(query, myConnection)
            Dim myAdapter As New SqlDataAdapter(myCommand)

            myAdapter.Fill(results)

            myConnection.Close()
         End Using

         DisplayResults(results, databaseName)
      End If
   Next
End Sub

To issue the query to a database you need to be able to form a connection string that references it. How you do this depends on your implementation. I've got a method called ConstructConnectionString(databaseName) that will generate a connection string to the specified database, and the result of this method is passed into the SqlConnection object's constructor.

After the query is issued and the results are stored in the results DataTable, the DisplayResults method is called, passing in the DataTable and the name of the database. The DisplayResults method is responsible for displaying the current database's results on the page. The results are displayed in a Table Web control named tblResults, whose rows and columns are programmatically added based on the data in the results DataTable.

The following methods - DisplayResults, AddDatabaseNameRow, AddHeaderRow, AddDataRows, and AddSpacerRow - show how to construct the rows in the table for a particular set of results from a particular database. I've removed some formatting-related code in these methods for brevity.

Private Sub DisplayResults(ByVal results As DataTable, ByVal databaseName As String)
   'Determine how many columns are in the resultset
   Dim columns As Integer = results.Columns.Count

   'Add the header row that displays the database name
   AddDatabaseNameRow(databaseName, columns, results.Rows.Count)

   'Add the header row
   AddHeaderRow(results)

   'Add the data
   AddDataRows(results)

   'Add a spacer row
   AddSpacerRow(columns)
End Sub

Private Sub AddDatabaseNameRow(ByVal databaseName As String, ByVal columns As Integer, ByVal rowCount As Integer)
   Dim dbNameRow As New TableRow()

   Dim dbNameCell As New TableCell()
   dbNameCell.Text = String.Format("{0} ({1} Records)", databaseName, rowCount)
   dbNameCell.ColumnSpan = columns

   dbNameRow.Cells.Add(dbNameCell)
   tblResults.Rows.Add(dbNameRow)
End Sub

Private Sub AddHeaderRow(ByVal results As DataTable)
   Dim dbHeaderRow As New TableRow()

   For Each column As DataColumn In results.Columns
      Dim dbHeaderCell As New TableCell()
      dbHeaderCell.Text = column.ColumnName
      dbHeaderRow.Cells.Add(dbHeaderCell)
   Next

   tblResults.Rows.Add(dbHeaderRow)
End Sub

Private Sub AddDataRows(ByVal results As DataTable)
   Dim count As Integer = 0

   For Each row As DataRow In results.Rows
      Dim dbDataRow As New TableRow()
      For Each column As DataColumn In results.Columns
         Dim dbDataCell As New TableCell()
         dbDataCell.Text = row(column.ColumnName).ToString()
         dbDataRow.Cells.Add(dbDataCell)
      Next

      tblResults.Rows.Add(dbDataRow)
   Next
End Sub

Private Sub AddSpacerRow(ByVal columns As Integer)
   Dim dbSpacerRow As New TableRow()

   Dim dbSpacerCell As New TableCell()
   dbSpacerCell.Text = " "
   dbSpacerCell.ColumnSpan = columns

   dbSpacerRow.Cells.Add(dbSpacerCell)
   tblResults.Rows.Add(dbSpacerRow)
End Sub

And... that's it! The above code issues the query to all of the selected databases and builds up the contents of a Table Web control based on the returned results. This web page provides a cleaner, easier to use interface for querying multiple databases than sp_MsForEachDb. With this page you don't need to worry about typing in the question mark character or having your query run against all databases, including non-customer databases. As long as you have access to this web page you can simply enter a query, select the databases, and you're off and running.

Caveats...


One of the advantages of having a separate database for each customer is data isolation - there's a much lesser chance that data will "leak" across customer boundaries, or that a buggy page or SQL statement will affect all customers. While having a web page that allows administrators to run a query against multiple databases doesn't remove this isolation or its benefits, it does lessen the boundary imposed by the separate databases. If this web page is not properly locked down, non-administrative users may be able to gain access and run queries - including INSERT, UPDATE, and DELETE queries - against all customer databases. And even if this page is configured to only permit administrators, it is possible that an administrator might accidentally (or willfully) run a query that damages the data in multiple databases.

To help mitigate these problems, I suggest only allowing those administrators who have unfettered access to the database to be able to access this page. And for a bit of extra assurance you can add a few lines of code in the "Execute Query" Button's Click event handler to ensure that no INSERT, UPDATE, or DELETE queries are executed.

Happy Programming!

  • By Scott Mitchell


    Further Reading


  • Multi-Tenant Data Architecture
  • Ruminations on Multi-Tenant Data Architectures
  • Running the Same Query Against Multiple Databases
  • SQL Server Undocumented Stored Procedures: sp_MsForEachTable And sp_MsForEachDb


  • ASP.NET [1.x] [2.0] | ASPMessageboard.com | ASPFAQs.com | Advertise | Feedback | Author an Article