7 Designing .NET Applications for Performance Optimization : Designing .NET Applications

Designing .NET Applications
The guidelines in this section will help you to optimize system performance when designing .NET applications.
Using Connection Pooling
Connecting to a database is the single slowest operation inside a data-centric application. That's why connection management is important to application performance. Optimize your application by connecting once and using multiple statement objects, instead of performing multiple connections. Avoid connecting to a data source after establishing an initial connection.
Connection pooling lets you reuse connections. Closing connections does not close the physical connection to the database. When an application requests a connection, an active connection is reused, thus avoiding the network I/O needed to create a new connection. Connection pooling in ADO.NET is not provided by the core components of the .NET Framework. It must be implemented in the ADO.NET data provider itself.
Pre-allocate connections. Decide which connection strings you will need to meet your needs. Remember that each unique connection string creates a new connection pool.
Once created, connection pools are not destroyed until the active process ends or the connection lifetime is exceeded. Maintenance of inactive or empty pools involves minimal system overhead.
Connection and statement handling should be addressed before implementation. Spending time and thoughtfully handling connection management improves application performance and maintainability.
Opening and Closing Connections
Open connections just before they are needed. Opening them earlier than necessary decreases the number of connections available to other users and can increase the demand for resources.
To keep resources available, explicitly Close the connection as soon as it is no longer needed. If you wait for the garbage collector to implicitly clean up connections that go out of scope, the connections will not be returned to the connection pool immediately, tieing up resources that are not actually being used.
Close connections inside a finally block. Code in the finally block always runs, even if an exception occurs. This guarantees explicit closing of connections. For example:
try
{
   DBConn.Open();
   … // Do some other interesting work
}
catch (Exception ex)
{
   // Handle exceptions
}
finally
{
   // Close the connection
   if (DBConn != null)
      DBConn.Close();
}
If you are using connection pooling, opening and closing connections is not an expensive operation. Using the Close() method of the data provider's Connection object adds or returns the connection to the connection pool. Remember, however, that closing a connection automatically closes all DataReader objects that are associated with the connection.
Implementing Reauthentication
Typically, you can configure a connection pool to provide scalability for connections. In addition, to help minimize the number of connections required in a connection pool, you can switch the user associated with a connection to another user, a process known as reauthentication. For example, suppose you are using Kerberos authentication to authenticate users using their operating system user name and password.
To reduce the number of connections that must be created and managed, you may want to switch the user associated with a connection to multiple users using reauthentication. For example, suppose your connection pool contains a connection, Conn, which was established using the user ALLUSERS. You can have that connection service multiple users, User A, B, C, and so on, by switching the user associated with the connection Conn to User A, B, C, and so on.
For more information about the data provider’s support for reauthentication, refer to the DataDirect Connect for ADO.NET User’s Guide.
Managing Commits in Transactions
Committing transactions is slow due to the result of disk input/output and, potentially, network input/output. Always start a transaction after connecting; otherwise, you are in autocommit mode.
What does a commit actually involve? The database server must flush back to disk every data page that contains updated or new data. This is usually a sequential write to a journal file, but nonetheless, is a disk input/output. By default, Autocommit is on when connecting to a data source. Autocommit mode usually impairs performance because of the significant amount of disk input/output that is needed to commit every operation.
Furthermore, some database servers do not provide an autocommit mode natively. For this type of server, the ADO.NET data provider must explicitly issue a Commit statement and a BeginTransaction for every operation sent to the server. In addition to the large amount of disk input/output that is required to support autocommit mode, a performance penalty is paid for up to three network requests for every statement that is issued by an application.
The following code fragment starts a transaction for Oracle:
OracleConnection MyConn = new OracleConnection("Connection String info");
MyConn.Open()
 
// Start a transaction
OracleTransaction TransId = MyConn.BeginTransaction();
 
// Enlist a command in the current transaction
OracleCommand OracleToDS = new OracleCommand();
OracleToDS.Transaction = TransId;
...
// Continue on and do more useful work in the
// transaction
Although using transactions can help application performance, do not take this tip too far. Leaving transactions active can reduce throughput by holding locks on rows for long times, preventing other users from accessing the rows. Commit transactions in intervals that allow maximum concurrency.
Choosing the Right Transaction Model
Many systems support distributed transactions; that is, transactions that span multiple connections. Distributed transactions are at least four times slower than normal transactions due to the logging and network input/out that is needed to communicate between all the components involved in the distributed transaction (the ADO.NET data provider, the transaction monitor, and the database system).
Use distributed transactions used only when transactions must span multiple databases or multiple servers. Unless they are required, avoid using distributed transactions. Instead, use local transactions whenever possible.
Using Commands Multiple Times
Choosing whether to use the Command.Prepare method can have a significant positive (or negative) effect on query execution performance. The Command.Prepare method tells the underlying data provider to optimize for multiple executions of statements that use parameter markers. Note that it is possible to Prepare any command regardless of which execution method is used (ExecuteReader, ExecuteNonQuery, or ExecuteScalar).
Consider the case where an ADO.NET data provider implements Command.Prepare by creating a stored procedure on the server that contains the prepared statement. Creating stored procedures involves substantial overhead, but the statement can be executed multiple times. Although creating stored procedures is performance-expensive, execution of that statement is minimized because the query is parsed and optimization paths are stored at create procedure time. Applications that execute the same statement multiples times can benefit greatly from calling Command.Prepare and then executing that command multiple times.
However, using Command.Prepare for a statement that is executed only once results in unnecessary overhead. Furthermore, applications that use Command.Prepare for large single execution query batches exhibit poor performance. Similarly, applications that either always use Command.Prepare or never use Command.Prepare do not perform as well as those that use a logical combination of prepared and unprepared statements.
Using Statement Caching
A statement cache is a group of prepared statements or instances of Command objects that can be reused by an application. Using statement caching can improve application performance because the actions on the prepared statement are performed once even though the statement is reused multiple times over an application’s lifetime.
A statement cache is owned by a physical connection. After being executed, a prepared statement is placed in the statement cache and remains there until the connection is closed.
Caching all of the prepared statements that an application uses might appear to offer increased performance. However, this approach may come at a cost of database memory if you implement statement caching with connection pooling. In this case, each pooled connection has its own statement cache that may contain all of the prepared statements that are used by the application. All of these pooled prepared statements are also maintained in the database’s memory.
See Chapter 3 “Using Your Data Provider with the ADO.NET Entity Framework” for application programming contexts that use the ADO.NET Entity Framework.
Using Parameter Markers as Arguments to Stored Procedures
When calling stored procedures, always use parameter markers for the argument markers instead of using literal arguments.
ADO.NET data providers can call stored procedures on the database server either by executing the procedure the same way as any other SQL query, or by optimizing the execution by invoking a Remote Procedure Call (RPC) directly into the database server. When you execute the stored procedure as a SQL query, the database server parses the statement, validates the argument types, and converts the arguments into the correct data types.
Remember that SQL is always sent to the database server as a character string, for example, "getCustName (12345)". In this case, even though the application programmer might assume that the only argument to getCustName is an integer, the argument is actually passed inside a character string to the server. The database server parses the SQL query, consults database metadata to determine the parameter contract of the procedure, isolates the single argument value 12345, then converts the string '12345' into an integer value before finally executing the procedure as a SQL language event.
Invoking an RPC inside the database server avoids the overhead of using a SQL character string. Instead, an ADO.NET data provider constructs a network packet that contains the parameters in their native data type formats, and executes the procedure remotely.
To use stored procedures correctly, set the CommandText property of the Command object to the name of the stored procedure. Then, set the CommandType property of the command to StoredProcedure. Finally, pass the arguments to the stored procedure using parameter objects. Do not physically code the literal arguments into the CommandText.
Example 1
SybaseCommand DBCmd = new SybaseCommand("getCustName", Conn);
SybaseDataReader = myDataReader;
myDataReader = DBCmd.ExecuteReader();
In this example, the stored procedure cannot be optimized to use a server-side RPC. The database server must treat the SQL request as a normal language event which includes parsing the statement, validating the argument types, and converting the arguments into the correct data types before executing the procedure.
Example 2
SybaseCommand DBCmd = new SybaseCommand("getCustName", Conn);
DBCmd.Parameters.Add("param1",SybaseDbType.Int,10,"").Value = 12345
myDataReader.CommandType = CommandType.StoredProcedure;
myDataReader = DBCmd.ExecuteReader();
In this example, the stored procedure can be optimized to use a server-side RPC. Because the application avoids literal arguments and calls the procedure by specifying all arguments as parameters, the ADO.NET data provider can optimize the execution by invoking the stored procedure directly inside the database as an RPC. This example avoids SQL language processing on the database server and the execution time is greatly improved.
Choosing Between a DataSet and a DataReader
A critical choice when designing your application is whether to use a DataSet or a DataReader. If you need to retrieve many records rapidly, use a DataReader. The DataReader object is fast, returning a fire hose of read-only data from the server, one record at a time. In addition, retrieving results with a DataReader requires significantly less memory than creating a DataSet. The DataReader does not allow random fetching, nor does it allow for updating the data. However, ADO.NET data providers optimize their DataReaders for efficiently fetching large amounts of data.
In contrast, the DataSet object is a cache of disconnected data stored in memory on the client. In effect, it is a small database in itself. Because the DataSet contains all of the data that has been retrieved, you have more options in the way you can process the data. You can randomly choose records from within the DataSet and update, insert, and delete records at will. You can also manipulate relational data as XML. This flexibility provides some impressive functionality for any application, but comes with a high cost in memory consumption. In addition to keeping the entire result set in memory, the DataSet maintains both the original and the changed data, which leads to even higher memory usage. Do not use DataSets with very large result sets because the scalability of the application will be drastically reduced.
Using Native Managed Providers
Bridges into unmanaged code, that is, code outside the .NET environment, adversely affect performance. Calling unmanaged code from managed code causes the CLR (Common Language Runtime) to make additional checks on calls to the unmanaged code, which impacts performance.
The .NET CLR is a very efficient and highly tuned environment. By using 100% managed code so that your .NET assemblies run inside the CLR, you can take advantage of the numerous built-in services to enhance the performance of your managed application and your staff. The CLR provides automatic memory management, so developers don't have to spend time debugging memory leaks. Automatic lifetime control of objects includes garbage collection, scalability features, and support for side-by-side versions. In addition, the .NET Framework security enforces security restrictions on managed code that protects the code and data from being misused or damaged by other code. An administrator can define a security policy to grant or revoke permissions on an enterprise, a machine, an assembly, or a user level.
However, many ADO.NET data provider architectures must bridge outside the CLR into native code to establish network communication with the database server. The overhead and processing that is required to enter this bridge is slow in the current version of the CLR.
Depending on your architecture, you may not realize that the underlying ADO.NET data provider is incurring this security risk and performance penalty. Be careful when choosing an ADO.NET data provider that advertises itself as a 100% or pure managed code data provider. If the "Managed Data Provider" requires unmanaged database clients or other unmanaged pieces, then it is not a 100% managed data access solution. Only a very few vendors produce true managed code providers that implement their entire stack as a managed component.