7 Designing .NET Applications for Performance Optimization : Retrieving Data

Retrieving Data
To retrieve data efficiently, return only the data that you need, and choose the most efficient method of doing so. The guidelines in this section will help you to optimize system performance when retrieving data with .NET applications.
Retrieving Long Data
Unless it is necessary, applications should not request long data because retrieving long data across a network is slow and resource-intensive. Remember that when you use a DataSet, all data is retrieved from the data source, even if you never use it.
Although the best method is to exclude long data from the select list, some applications do not formulate the select list before sending the query to the ADO.NET data providers (that is, some applications send SELECT * FROM table name ...). If the select list contains long data, most ADO.NET data providers must retrieve that data at fetch time even if the application does not bind the long data in the result set. When possible, try to implement a method that does not retrieve all columns of the table.
Most users don't want to see long data. If the user does want to see these result items, then the application can query the database again, specifying only the long columns in the select list. This method allows the average user to retrieve the result set without having to pay a high performance penalty for network traffic.
Consider a query such as "SELECT * FROM Employee WHERE ssid = '999-99-2222'". An application might only want to retrieve this employee's name and address. But, remember that an ADO.NET data provider cannot tell which result columns an application might be trying to retrieve when the query is executed. A data provider only knows that an application can request any of the result columns. When the ADO.NET data provider processes the fetch request, it will most likely return at least one, if not more, result rows across the network from the database server. In this case, a result row will contain all the column values for each row — including an employee picture if the Employee table happens to contain such a column. Limiting the select list to contain only the name and address columns results in decreased network traffic and a faster performing query at runtime.
Reducing the Size of Data Retrieved
To reduce network traffic and improve performance, you can reduce the size of any data being retrieved to some manageable limit by using a database-specific command. For example, an Oracle data provider might let you limit the number of bytes of data the connection uses to fetch multiple rows. A Sybase data provider might let you limit the number of bytes of data that can be returned from a single IMAGE column in a result set. For example, with Sybase, you can issue "Set TEXTSIZE n" on any connection, where n sets the maximum number of bytes that will ever be returned to you from any TEXT or IMAGE column.
If the data provider allows you to define the packet size, use the smallest packet size that meets your needs.
In addition, be careful to return only the rows you need. If you return five rows when you only need two rows, performance is decreased, especially if the unnecessary rows include long data.
Especially when using a DataSet, be sure to use a Where clause with every Select statement to limit the amount of data that will be retrieved. Even when you use a Where clause, a Select statement that does not adequately restrict the request could return hundreds of rows of data. For example, if you want the complete row of data from the Employee table for each manger hired in recent years, you might be tempted to issue the following statement and then, in your application code, filter out the rows who are not managers:
SELECT * FROM Employee WHERE hiredate > 2000
However, suppose the Employee table contains a photograph column. Retrieving all the extra rows could be extremely expensive. Let the database filter them for you and avoid having all the extra data that you don't need sent across the network. A better request further limits the data returned and improves performance:
SELECT * FROM Employee WHERE hiredate > 2000 AND job_title='Manager'
Using Commands that Retrieve Little or No Data
Commands such as Update, Insert, and Delete do not return data. Use these commands with ExecuteNonQuery method of the Command object. Although you can successfully execute these commands using the ExecuteReader method, the ADO.NET data provider will properly optimize the database access for IUpdate, Insert, and Delete statements only through the ExecuteNonQuery method.
The following example shows how to insert a row into the Employee table using ExecuteNonQuery:
DBConn.Open();
DBTxn = DBConn.BeginTransaction();
// Set the Connection property of the Command object
DBCmd.Connection = DBConn;
// Set the text of the Command to the INSERT statement
DBCmd.CommandText = "INSERT INTO Employee VALUES (15,'HAYES','ADMIN',6, " +
 "'17-APR-2002',18000,NULL,4)";
// Set the transaction property of the Command object
DBCmd.Transaction = DBTxn;
// Execute the statement with ExecuteNonQuery, because we are not
// returning results
DBCmd.ExecuteNonQuery();
// Now commit the transaction
DBTxn.Commit();
 
// Close the connection
DBConn.Close();
Use the ExecuteScalar method of the Command object to return a single value, such as a sum or a count, from the database. The ExecuteScalar method returns only the value of the first column of the first row of the result set. Once again, you could use the ExecuteReader method to successfully execute such queries, but by using the ExecuteScalar method, you tell the ADO.NET data provider to optimize for a result set that consists of a single row and a single column. By doing so, the data provider can avoid a lot of overhead and improve performance. The following example shows how to retrieve the count of a group:
// Retrieve the number of employees who make more than $50000
// from the Employee table
 
// Open connection to Sybase database
SybaseConnection  Conn;
Conn = new SybaseConnection("host=bowhead;port=4100;User ID=test01;
Password=test01;Database Name=Accounting");
Conn.Open();
 
// Make a command object
SybaseCommand  salCmd = new SybaseCommand("SELECT COUNT(sal) FROM Employee" +
   "WHERE sal>'50000'",Conn);
 
try
{
    int count = (int)salCmd.ExecuteScalar();
}
catch (Exception ex)
{
    // Display any exceptions in a messagebox
    MessageBox.Show (ex.Message);
}
// Close the connection
Conn.Close();
Choosing the Right Data Type
Advances in processor technology brought significant improvements to the way that operations such as floating-point math are handled. However, when the active portion of your application does not fit into on-chip cache, retrieving and sending certain data types is still expensive. When you are working with data on a large scale, it is important to select the data type that can be processed most efficiently.
For example, integer data is processed faster than decimal data. Decimal data is defined according to internal database-specific formats. The data must be decoded and then converted, typically to a string. Note that all Oracle numeric types are actually decimals.
Processing time is longest for character strings, followed by integers, which usually require some conversion or byte ordering.