DB2 Server for VSE & VM: Database Administration


Appendix A. Estimating Your Dbspace Requirements

This appendix describes procedures and calculations you can follow to determine the amount of storage to allocate to your dbspaces. You must determine:

  1. The required size of each permanent dbspace, described in Estimating Dbspace Size
  2. The storage required to hold a working set of the data, described "Estimating Internal Dbspace Size and DASD Needs for Sort Operations".


Estimating Dbspace Size

You need to estimate data storage requirements to establish dbspace sizes. The required size of a dbspace depends on:

Using the above estimated values, you can calculate the required size of the dbspace by determining and adding the number of pages required for:

The following formula shows how these values are used to calculate the number of dbspace pages needed for a set of tables:



   DBSPACE PAGES = HEADER PAGES + DATA PAGES + INDEX PAGES + ALLOWANCE

General Guidelines

For most dbspaces, it is sufficient to use the default value of 8 for the number of HEADER PAGES. You should also use the default of 33 percent for PCTINDEX, rather than estimate the number of index pages needed, unless you anticipate doing extensive indexing. This default reserves approximately one third of the total space for indexes. If you assume both defaults, 8 HEADER PAGES and 33 percent for PCINDEX, the above formula becomes:

   DBSPACE PAGES = 8 + 1.50 x (DATA PAGES + ALLOWANCE)

The DBSPACE PAGES number derived must be rounded up to a multiple of 128. That is,

   REQUIRED DBSPACE PAGES = TRUNC [ (DBSPACE PAGES + 127 ) / 128 ] x 128

The TRUNC function, for truncate operation, indicated here means to compute the value between the brackets [ ] and then use only the integer part of that value. For example, if the value calculated is 27.8, use 27.

You should allow from 50 to 200 percent for ALLOWANCE, depending on the nature of the tables to be stored. If the number of rows are relatively stable and you do not anticipate adding columns to tables (or even adding tables), adding an ALLOWANCE of 50 percent is safe. To allow all forms of growth (inserting rows, adding columns, and adding tables), you should consider an ALLOWANCE of 200 percent (2 x data pages).

Notes:

  1. The ALLOWANCE in the above formulas correspond to reserved unused pages. As such, they will not use real space in the storage pool. Dbspace allocations should be substantially greater than what appears to be necessary by the above algorithm. Because dbspace pages do not occupy storage pool slots until they are loaded, there is little to be gained in making a table a tight fit in a dbspace.

  2. When a dbspace can no longer contain a table, you can change the parameters of the dbspace. See "Altering the Design of a Dbspace" for more information.

Estimating Storage for a Table

To estimate the amount of storage required for a table, consider:

There are no guidelines for estimating the number of rows your table will have. However, for the purpose of determining the dbspace requirements, it is wise to look ahead to potential growth of the table in the foreseeable future. Consider the estimated size of the table 2 or 3 years from now, rather than its current size.

The length of a stored row can be estimated using Table 28. To complete the calculations of Table 28, you must know the type and length of all columns in your tables. If long-field columns are involved, you should first calculate the average length of long-field columns using Table 29.

Table 28. Form for Calculating the Average Row Length of a Stored Row

COLUMN OVERHEAD
The number of columns supporting nulls
The number of VARCHAR(n) columns with n<=254
The number of VARGRAPHIC(n) columns with n<=127
The number (N) of LONG FIELDS1 N x 6
 
SUM OF COLUMN OVERHEAD FACTORS


 
___
___
___
___
 
___


COLUMN DATA STORAGE FACTORS2
INTEGER: 4
SMALLINT: 2
DECIMAL: TRUNC [PRECISION/2 + 1]
FLOAT: 8  3
FLOAT: 4  3
CHAR(n): n
GRAPHIC(n): n x 2  4
DATE: 4
TIME: 3
TIMESTAMP: 10
VARCHAR(n) n<=254: average length
VARGRAPHIC(n) n<=127: average length x 2 4
LONG FIELDS : calculated separately (See Table 29.)
 
SUM OF COLUMN DATA STORAGE FACTORS


 
___
___
___
___
___
___
___
___
___
___
___
___
___
 
___


ROW OVERHEAD FACTOR


8


AVERAGE LENGTH OF EACH STORED ROW (AVGROWLEN)



___

Notes:

  1. The following data types are long fields: VARCHAR(n) with n>254, VARGRAPHIC(n) with n>127, LONG VARCHAR, and LONG VARGRAPHIC.
  2. The factors indicated in the COLUMN DATA STORAGE FACTORS area are to be used for each column of the type specified. The sum of those factors goes on the line at the right. For example, if a table consists of 4 columns of DECIMAL data, calculate the factor for each DECIMAL column, add up those factors, and enter the sum on the line opposite DECIMAL.
  3. The value 8 for FLOAT is for double-precision floating point columns (FLOAT(n) where 22<=n<=53, or n is not specified). The value 4 for FLOAT is for single-precision floating point columns (FLOAT(n) where 1<=n<=21).
  4. Each graphic character occupies 2 bytes of storage. When you determine the average length of a GRAPHIC or VARGRAPHIC column in characters, multiply that number by 2 to get the number of bytes.

Column Overhead refers to descriptive information stored with an instance of the column. The overhead depends upon the characteristics of the column, as follows:

Pointers to the long-field values are stored in a special internal format that involves 6 bytes of control information in the stored row (2-byte length value and 4-byte tuple identifier (TID)).

Column Data Storage refers to the storage space occupied by the actual column values. The numbers shown are number of bytes.

Row Overhead is a fixed overhead for each row in the table. It consists of a 6-byte row header and a 2-byte offset into the page, for a total of 8 bytes.

Table 29. Formula for Calculating the Average Length of a Long-Field Column
LONG-FIELD VALUE OVERHEAD = (TRUNC [ (average length + 3999) / 4000 ] x 20)
LONG-FIELD VALUE STORAGE = (TRUNC [ (average length + 249) / 250 ] x 250)

AVERAGE LENGTH OF EACH STORED LONG FIELD (AVGCOLLEN) =
LONG-FIELD VALUE OVERHEAD + LONG-FIELD VALUE STORAGE


Note:The above formula should be used to calculate the average length of a stored long field. The calculation needs to be done for each column of a table that is a long field. The following data types are long fields: VARCHAR(n) with n>254, VARGRAPHIC(n) with n>127, LONG VARCHAR, and LONG VARGRAPHIC.

Graphic characters occupy 2 bytes of storage. When you determine the average length of a GRAPHIC or VARGRAPHIC column in characters, multiply that number by 2 to get the number of bytes.

LONG-FIELD Value Overhead

The value of a LONG FIELD is stored separately from the rest of the stored row. The value is stored as a chain of entries in an internal table. Each entry of the internal table is composed of 16 columns of 250 bytes each (some potentially null). Each record has 20 bytes of overhead (a 2-byte offset, a 6-byte row header, and a 12-byte unary link pointer chain). Thus, the overhead for a LONG-FIELD value depends on the actual length of the data and includes 20 bytes for each 4000-byte (16 columns of 250 bytes) entry required to store the value.

Long-Field Value Storage

Long-field values are stored in increments of 250 bytes. Each increment is one fixed-length 250-byte column value in the internal table. For example a 10-byte long-field value occupies 250 bytes of storage, plus the long-field-value overhead of 20 bytes. A 248-byte long-field value occupies 250 bytes, a 260-byte long-field value occupies 500 bytes storage, and so on.

Estimating the Number of Header Pages

The number of header pages for a dbspace can be established on the SQL ACQUIRE DBSPACE statement. In general, you should use the default value of 8 for this option.

A more precise estimate of the number of header pages follows. It is more complex than the general guidelines above, but will assist you in your calculations if you require a better estimation.

The header pages contain information of the objects defined in a dbspace. Each object defined in the dbspace, such as a table or an index, is recorded in the header pages via a control row. For more information on the types of objects that can be defined in a dbspace and the types of control rows that are inserted in the header pages, see the DB2 Server for VSE & VM Diagnosis Guide and Reference manual.

To estimate the number of header pages required:

  1. Calculate the number of bytes required by the objects defined in the dbspace as follows:
  2. Divide the total number of bytes required by 4080.
  3. Round the result to the next highest integer.
  4. The result is the number of header pages required for the dbspace.

For example, assume you are planning to acquire a dbspace that will contain 3 tables. Table A has 10 columns, 2 indexes each defined on a single column, and no long-field columns. Table B has 14 columns; 1 index containing 3 columns, and no long-field columns. Table C contains 3 columns, 1 index containing 1 column, and 2 long-field columns. The estimated number of header pages for this dbspace is as follows:

DBSPACE:                 24
 
Table A:                 32 + (2 x 10)    < table
                         20 + (2 x 1)     < index 1
                         20 + (2 x 1)     < index 2
 
Table B:                 32 + (2 x 14)    < table
                         20 + (2 x 3)     < index
 
Table C:                 32 + (2 x 3)     < table
                         20 + (2 x 1)     < index
                         84               < long-field columns
                         =============
Total:                   350 bytes
 
Divide by 4080           1 header page required.
 

Estimating the Number of Data Pages

The number of data pages required to store a table depends on whether the rows in the table are of fixed or variable length. The next section describes a method for calculating the pages required for storing tables with fixed length rows. For tables with variable length rows (rows with VARCHAR or VARGRAPHIC data), refer to "Estimating Data Pages for a Table with Variable Length Rows".
Note:Long-field columns do not produce variable length rows because the long-field values are stored separately.

Pages Required for Storing Tables with Fixed Length Rows

The number of data pages required to hold the tables can be estimated after determining the average row lengths (AVGROWLEN) for each table and the effective page size (EPS) based on PCTFREE setting at the time the pages are to be loaded.

The number of data pages required is estimated as follows:

  1. Determine the average row length (AVGROWLEN) as in Table 28.
  2. Determine the free space requirement (PCTFREE). Use a whole number for PCTFREE. That is, if the percent free is 10, use 10 for PCTFREE, not 0.10.
  3. Calculate:
         40 x PCTFREE + AVGROWLEN
    
  4. Use the number calculated in step 3 to find the corresponding EPS in the following table:

    Table 30. Effective Page Size Based on Free Space Requirement
    40 x PCTFREE + AVGROWLEN EPS (Effective Page Size)
    8-17 4065 + AVGROWLEN
    18-32 4050 + AVGROWLEN
    33-52 4030 + AVGROWLEN
    53-102 3980 + AVGROWLEN
    103-252 3830 + AVGROWLEN
    253-502 3580 + AVGROWLEN
    503-1002 3080 + AVGROWLEN
    1003-2002 2080 + AVGROWLEN
    2003-4020   62 + AVGROWLEN
    4021-4080    2 + AVGROWLEN
    4081- See note below.
    Note:For the case where (40 x PCTFREE + AVGROWLEN) >= 4081:
    If AVGROWLEN <= 4080,   number of rows per page = 1 and EPS = AVGROWLEN.
    If AVGROWLEN > 4080,   row size exceeds DB2 limits. Reduce your row size and recalculate.

  5. Calculate the number of rows per page:
       Rows per Page = MINIMUM (256, TRUNC [ EPS/AVGROWLEN ] )
    
  6. Calculate the number of data pages required as follows:

    If the average long-field length is <= 4020, then:

       REQUIRED     Number of Rows   Number of Rows x Number of Long Fields
       DATA      =  -------------- + --------------------------------------
       PAGES        Rows per Page     4020 / Average Long-Field Length
    
    Note:When evaluating the expression, truncate the denominator 4020/average long-field length to the nearest integer and round up the results of both division expressions to the nearest integer before adding them.

    If the average long-field length is >4020:

       REQUIRED     Number of Rows
       DATA      =  -------------- + Number of Rows x Number of Long Fields
       PAGES        Rows per Page
     
                      Average Long-Field Length
                    x -------------------------
                                4020
    
    Note:Round the results of both division terms up to the nearest integer before evaluating the expression.

If you are loading tables separately, calculate the number of pages required for each table separately. If you are loading the tables in an interleaved fashion, use the longest AVGROWLEN of all the tables in determining the Effective Page Size.

Notes:

  1. Storage for long fields (LONG VARCHAR, LONG VARGRAPHIC, VARCHAR(n) with n>254, VARGRAPHIC(n) with n>127) columns must be calculated apart from the rest of the row. AVGROWLEN will include six bytes for each long-field value. However, storage for the actual long field will be calculated separately.

  2. If you have already established a database and are defining a new dbspace, you can get an estimate of the data pages required by modeling the dbspace. That is, create the tables in a test dbspace and load a sample of the data. Then you can issue queries against SYSTEM.SYSDBSPACES and SYSTEM.SYSCATALOG to find out how many pages were required for the data sample. The data for the real tables will be proportionately higher. When modeling data, avoid using nulls in the sample. Nulls tend to produce low results.

Examples of Estimating the Number of Data Pages

Example 1

The example work sheet shown in Table 31 is for a table that has just one CHAR(100) column supporting nulls.

Table 31. Example 1 -- Calculating the Average Row Length

COLUMN OVERHEAD
The number of columns supporting nulls
The number of VARCHAR(n) columns with n<=254
The number of VARGRAPHIC(n) columns with n<=127
The number (N) of Long Fields
* N x 6


SUM OF COLUMN OVERHEAD FACTORS


 
1
0
0
 
0
 
1


COLUMN DATA STORAGE FACTORS
* INTEGER: 4
* SMALLINT: 2
* DECIMAL: TRUNC [PRECISION/2 + 1]
* FLOAT: 8 (for double-precision)
* FLOAT: 4 (for single-precision)
* CHAR(n): n
* GRAPHIC(n): n x 2
* DATE: 4
* TIME: 3
* TIMESTAMP: 10
* VARCHAR(n): average length
* VARGRAPHIC(n): average length x 2
* Long Fields: calculated separately (See
Table 29.)


SUM OF COLUMN DATA STORAGE FACTORS


 
0
0
0
0
0
100
0
0
0
10
0
0
0
 
 
100


ROW OVERHEAD FACTOR


8


AVERAGE LENGTH OF EACH STORED ROW


109

The number of DATA PAGES required to load 25000 rows into this table in a dbspace defined to have 10 % free space is:

  1. Determine the Average Row Length.

    AVGROWLEN = 109

  2. Determine the Free Space Requirement.

    PCTFREE = 10

  3. Calculate 40 x PCTFREE + AVGROWLEN.

    40 x 10 + 109 = 509

  4. From Table 30, determine the Effective Page Size (EPS), using the number calculated in step 3 to find the corresponding EPS.

    503 - 1002 ..... 3080 + AVGROWLEN

    EPS = 3080 + 109 = 3189

  5. Calculate the number of rows per page.

    Rows per Page = MINIMUM (256, TRUNC [ 3189/109 ] ) = 29

  6. The number of data pages required is:
       Number of Rows   Number of Rows x Number of Long Fields
       -------------- + --------------------------------------  = 863
       Rows per Page       4020 / Average Long-Field Length
    

Example 2

The example work sheet shown in Table 32 is for a table that has:


Table 32. Example 2 -- Calculating the Average Row Length of a Stored Row

COLUMN OVERHEAD
The number of columns supporting nulls
The number of VARCHAR(n) columns with n<=254
The number of VARGRAPHIC(n) columns with n<=127
The number (N) of Long Fields
* N x 6
 
SUM OF COLUMN OVERHEAD FACTORS


 
11
4
1
 
0
 
16


COLUMN DATA STORAGE FACTORS
* INTEGER: 4
* SMALLINT: 2
* DECIMAL: TRUNC [PRECISION/2 + 1]


* FLOAT: 8 (for double-precision)
* FLOAT: 4 (for single-precision)


* CHAR(n): n
* GRAPHIC(n): n x 2
* DATE: 4
* TIME: 3
* TIMESTAMP: 10
* VARCHAR(n): average length
* VARGRAPHIC(n): average length x 2
* Long Fields: calculated separately


SUM OF COLUMN DATA STORAGE FACTORS


 
4
2
13
 
0
0
 
7
20
4
3
0
60
24
 
 
137


ROW OVERHEAD FACTOR


8


AVERAGE LENGTH OF EACH STORED ROW


161

The number of DATA PAGES required to load 600 rows into this table in a dbspace defined to have 15 percent free space is:

  1. Determine the Average Row Length.

    AVGROWLEN = 161

  2. Determine the Free Space Requirement.

    PCTFREE = 15

  3. Calculate 40 x PCTFREE + AVGROWLEN.

    40 x 15 + 161 = 761

  4. From Table 30, determine the Effective Page Size (EPS), using the number calculated in step 3 to find the corresponding EPS.

    503 - 1002 ..... 3080 + AVGROWLEN

    EPS = 3080 + 161 = 3241

  5. Calculate the number of rows per page.

    Rows per Page = MINIMUM (256, TRUNC [ 3241/161 ] ) = 20

  6. The number of data pages required is:
       Number of Rows   Number of Rows x Number of Long Fields
       -------------- + --------------------------------------  = 30
       Rows per Page       4020 / Average Long-Field Length
    

Usually, you store a table this small in a dbspace with other tables. If a dbspace has more than one table, the total number of DATA PAGES required for the dbspace is the sum of the data page requirements of all the tables in the dbspace.

Example 3

The example work sheets shown in Table 33 and Table 34 are for a table that has:


Table 33. Example 3 -- Calculating the Average Row Length of a Stored Row

COLUMN OVERHEAD
The number of columns supporting nulls
The number of VARCHAR(n) columns with n<=254
The number of VARGRAPHIC(n) columns with n<=127
The number (N) of Long Fields
* N x 6
 
SUM OF COLUMN OVERHEAD FACTORS


 
12
3
1
 
6
 
22


COLUMN DATA STORAGE FACTORS
* INTEGER: 4
* SMALLINT: 2
* DECIMAL: TRUNC [PRECISION/2 + 1]
* FLOAT: 8 (for double-precision)
* FLOAT: 4 (for single-precision)
* CHAR(n): n
* GRAPHIC(n): n x 2
* DATE: 4
* TIME: 3
* TIMESTAMP: 10
* VARCHAR(n): average length
* VARGRAPHIC(n): average length x 2
* Long Fields: calculated separately (See
Table 34.)
 
SUM OF COLUMN DATA STORAGE FACTORS


 
4
2
13
0
0
7
20
8
0
10
28
24
 
 
116


ROW OVERHEAD FACTOR


8


AVERAGE LENGTH OF EACH STORED ROW


146


Table 34. Example 3 -- Calculating the Average LONG VARCHAR Stored Length
  LONG VARCHAR VALUE OVERHEAD
    The number(N) of LONG VARCHAR columns
      * N x (TRUNC [ (average length + 3999) / 4000] x 20)

 
20

  LONG VARCHAR VALUE STORAGE
    TRUNC [ (average length + 249) / 250 ] x 250

250

  AVERAGE LENGTH OF EACH STORED LONG VARCHAR

270

The number of DATA PAGES required to load 25000 rows into this table in a dbspace defined to have 10 percent free space is:

  1. Determine the Average Row Length.

    AVGROWLEN = 146

  2. Determine the Free Space Requirement.

    PCTFREE = 10

  3. Calculate 40 x PCTFREE + AVGROWLEN.

    40 x 10 + 146 = 546

  4. From Table 30, determine the Effective Page Size (EPS), using the number calculated in step 3 to find the corresponding EPS.

    503 - 1002 ..... 3080 + AVGROWLEN

    EPS = 3080 + 146 = 3226

  5. Calculate the number of rows per page.

    Rows per Page = MINIMUM (256, TRUNC [ 3226/146 ] ) = 22

  6. The number of data pages required is:
       Number of Rows   Number of Rows x Number of Long Fields
       -------------- + --------------------------------------  = 2923
       Rows per Page       4020 / Average Long-Field Length
    

Estimating Data Pages for a Table with Variable Length Rows

The following methods provide estimates for tables containing variable length data with data types VARCHAR and VARGRAPHIC. Tables with columns containing variable length data types result in rows of differing lengths that can be distributed throughout a dbspace in different ways depending upon the order in which data is loaded. The distribution of the variable length rows in the dbspace can significantly affect the number of data pages occupied by a table.

There are three different methods you can use to more accurately estimate the data page requirements for tables with variable length rows:

Modeling
An approach using a test dbspace containing a test table that contains a representative sample of the data. The accuracy of this estimates depends solely on the representativeness of the test table.

Worst case
An approach that provides an estimate of the number of data pages that will accommodate the table regardless of the order of the rows. This approach will overestimate the number of pages in many cases, but will always ensure that you have allocated enough pages.

Splitting
An approach requiring an approximation of the number of rows that fall within a range of row lengths. This may produce a more realistic number of pages than the worst case method but does not ensure that the table will fit.

Estimating Data Pages with Modeling

Establish a database before you begin modeling your data page requirements. Then, do the following:

  1. Acquire a test dbspace.
  2. Create the table in the dbspace and load a sampling of the data into it.
  3. Ensure that the statistics for the table are up-to-date. If the statistics are not current, this can be done by performing a load (with statistics set on), or by performing an explicit UPDATE STATISTICS on the table.
  4. Get the NACTIVE value for this dbspace from the SYSDBSPACES catalog table. Since there is only one table (and its companion table, if there is one or more long fields) in the dbspace, then the NACTIVE value indicates the number of data pages this table is currently using.
  5. Multiply the NACTIVE value by a factor representing the relationship between the actual table size and this test table size. The result is an estimate of the number of data pages the actual table requires.

Consider the following when modeling your data page requirements in this way:

Estimating Using the Worst Case Method

This method is the safest method to use; it will ensure that you have enough pages regardless of the distribution of the rows in the table. However, it may overestimate your requirements.

To use this method you need to know the:

Then do the following:

  1. Calculate the maximum row length (MAXROWLEN) by using the maximum length of the VARCHAR and VARGRAPHIC columns instead of the average length as shown in Table 28.
  2. Substituting MAXROWLEN for AVGROWLEN, perform steps 1 to 4 of the formula for estimating the number of pages as shown in Estimating the Number of Data Pages. This produces the Effective Page Size for the MAXROWLEN (denoted EPSmax).
  3. Estimate the average lengths of columns in your table and calculate the average row length (AVGROWLEN) as described in Table 28.
  4. Calculate the worst case estimate using the following formula:
                                               AVGROWLEN x Number of Rows
       Worst Case =  MINIMUM ( Number of rows, -------------------------- )
                                                 EPSmax - MAXROWLEN + 1
    

Example using the Worst Case Method:

Consider a 500,000 row table being loaded into a dbspace with PCTFREE=10. Assume the overall AVGROWLEN value is 50 bytes. Assume the calculated MAXROWLEN value is 110 bytes for this table.

Calculate the EPSmax value as follows:

    40 x PCTFREE + MAXROWLEN = 40 x 10 + 110 = 510

The corresponding EPSmax is 3190.

Substitute the values in the worst case formula:

                      50 x 500000
    MINIMUM (500000, --------------) = MINIMUM (500000, 8114.2)
                     3190 - 110 + 1

To store this table you need at most 8115 data pages.

Estimating Using the Splitting Method

This method assumes that you can approximate the frequency of different ROWLENGTHs in the table to be stored. This method is as follows:

  1. Split the set of all rows into several ROWLENGTH groups and calculate page requirements for each group as if it were a separate table using the formula described in "Estimating the Number of Data Pages".
  2. Add the page requirements for the groups to estimate the total table page requirements.

Try several different groupings of rows, making sure that each group is large enough to cover several data pages. If groups of rows do not cover several data pages, the estimate could be too high.

Different groupings will give different results. Select the highest overall page estimate to ensure that your estimate includes a contingency.

Example Using the Splitting Method:

Consider a 2000 row table to be loaded into a dbspace with PCTFREE=0. Assume the overall AVGROWLEN value to be 1000 bytes. Assume, also, that 25 percent of the rows in the table are longer than 800 bytes, with an AVGROWLEN value = 2500 bytes. The remaining 75 percent of the rows are less than 800 bytes long, with an AVGROWLEN value = 500 bytes.

We consider two groups of rows for this calculation:

Group 1
with 1500 rows and AVGROWLEN = 500

Group 2
with 500 rows and AVGROWLEN = 2500.

Perform the calculations described in Estimating the Number of Data Pages treating each group as a table.

For Group 1 with AVGROWLEN = 500 and PCTFREE = 0, the EPS is 4080. Therefore we can fit 8 rows per page (4080/500 = 8.16) and we need 188 pages (1500/8 = 187.5) to store the 1500 rows.
For Group 2 with AVGROWLEN = 2500 and PCTFREE = 0, the EPS is 2562. Therefore we can fit 1 row per page (2562/2500 = 1.02) and we need 500 pages (500/1 = 500) to store the 500 rows.

Adding these two page requirements together gives an overall estimate of 688 data pages (188 + 500) to store the whole 2000 row table.

Compare this to the result if you used the formula for fixed length rows. Using only the formula described in Pages Required for Storing Tables with Fixed Length Rows and the overall AVGROWLEN of 1000, the EPS is 4080. Therefore we can fit 4 rows per page (4080/1000 = 4.08) and we need 500 pages (2000/4 = 500) to store all 2000 rows. This is considerably less than the 688 pages estimated. The real number of data pages required is likely between 500 and 688 depending on the order in which the rows are being stored in the dbspace.

Estimating the Number of Index Pages

Generally speaking, you should take the default allocation for index pages in the dbspace (PCTINDEX=33). This is means that the number of index pages is approximately DATA PAGES / 2. This leaves you considerable freedom to vary the indexing you do on the tables in the dbspace. Another way of looking at this is that if the number of index pages is more than half the number of data pages, you may be trying to support too many indexes on the tables in the dbspace. As a result, you may experience performance problems on INSERT, UPDATE, and DELETE operations.

However, if the data in the dbspace is largely used for read only operations, you may want to create a lot of different indexes. If this is the case, you may need more index pages than is provided for by the default PCTINDEX value of 33 percent. For such read only (or read mostly) cases, you may want to do the detailed analysis of index page requirements to determine the appropriate PCTINDEX value based on the size of indexes you plan on supporting.

If the data in the dbspace is to have very few indexes with rather small key lengths, then you may want to do the detailed analysis of index page requirements to determine an appropriate PCTINDEX value that is smaller than the default.

The formula for calculating the appropriate PCTINDEX value is:

                            TOTAL INDEX PAGES
   PCTINDEX =  -----------------------------------------------
                HEADER PAGES + DATA PAGES + TOTAL INDEX PAGES

In this formula, TOTAL INDEX PAGES is the sum of the number of index pages required for each planned index.

The next section provides the guidelines for estimating the number of index pages required for an index.

Estimating the Size of an Index

Index storage is allocated in pages. Each page contains data for only one index. The minimum storage required for any index is one page.

To estimate the amount of storage required for an index, you must consider the type of information in the index key and the amount of information in the table being indexed. The following table information must be considered for calculating the size of an index:

For fixed length unique key indexes, the following calculations for index size will be accurate. For variable length or non-unique indexes, the calculations may either overestimate or underestimate the size of an index.

Generally, the size may be overestimated if:

The size may be underestimated if the varying length keys are not evenly distributed. For example, if the ordering of the keys in the index is from shortest to longest, then the lengths are not evenly distributed and space will be underestimated.

To calculate the size of the index perform the following steps:

  1. Calculate the Effective Index Page Size

    The Effective Index Page Size (EIPS) is similar to the effective page size calculated for data pages. The formula for index pages differs for fixed length and variable length index keys.

    For fixed length index keys:

       EIPSmax = 4056 - (41 x PCTFREE)
    

    For variable length index keys:

    1. Calculate the maximum encoded length of each variable length column in the index (in bytes).

      For a short VARCHAR column, if it is the last column in the key,

         VARCOL(n) = maximum length of column
      

      Otherwise,

         VARCOL(n) = (INTEGER((max length of column - 1) / 4) + 1) * 5
      

      For a short VARGRAPHIC column, if it is the last column in the key,

         VARCOL(n) = 2*(maximum length of column)
      

      Otherwise,

         VARCOL(n) = (INTEGER((2*max length of column-1)/4)+1) * 5
      
    2. Calculate the maximum length of a key
         KEYLENmax = the sum of the lengths of fixed columns (in bytes)
                     + VARCOL(1) + ... + VARCOL(n)
                     + 1 for the length of the key
                     + 1 for each column that allows nulls
      
    3. Use this KEYLENmax value to calculate the maximum length of a cluster with 1 TID.
         CLUSTERmax = KEYLENmax
                      + 1 for number of TIDs
                      + 4 for the TID
      
    4. Use this CLUSTERmax value to calculate the minimum effective index page size for leaf pages.
         EIPSminleaf = 4056 - (41 x PCTFREE) - CLUSTERmax + 1
      
    5. Use the KEYLENmax value again to calculate the maximum length of a nonleaf pair.
         PAIRLENmax = KEYLENmax
                      + 3 for the page number
                      + 4 (if index is not unique)
      
    6. Use this PAIRLENmax value to calculate the minimum effective index page size for nonleaf pages.
         EIPSmin-nonleaf = 4056 - (41 x PCTFREE) - PAIRLENmax + 1
      
  2. Calculate the average number of rows per key value.

    The average number of rows identified in one cluster is:

          NUMBER_KEYS = Number of distinct keys
     
       ROWSPERCLUSTER = Number of rows in table
                        -----------------------
                            NUMBER_KEYS
     
    

    If ROWSPERCLUSTER is greater than 255, then the key must be duplicated. In this case, the following calculations must be done:

          NUMBER_KEYS = (TRUNC [1 + (ROWSPERCLUSTER/255) ]) x NUMBER_KEYS
     
                        Number of rows in table
       ROWSPERCLUSTER = -----------------------
                             NUMBER_KEYS
    
  3. Calculate the average length of a key value.
    1. Calculate the average encoded length of each variable length column in the index, if any, in bytes.
         VARCOLavg(n) = (1.25 x average length of column) + 3
      

      These numbers must be rounded up to integer values.

      Once again, when determining the length of graphic data, allow 2 bytes for each character.

    2. Calculate the average length of a key in the index.
      KEYLEN = the sum of the lengths of fixed columns (in bytes)
             + VARCOLavg(1) + ... + VARCOLavg(n)
             + 1 if there are any variable-length columns
             + 1 for each column that allows nulls
      
  4. Calculate the cluster size for the index, using the value of ROWPERCLUSTER from step 2.
       CLUSTERSIZE = 1 + KEYLEN + (4 x ROWSPERCLUSTER)
    
  5. Calculate the number of keys that can be put on a leaf page, using the value of CLUSTERSIZE from step 4.
       #KEYSleaf = TRUNC [EIPS/CLUSTERSIZE]
    

    where EIPS is EIPSmax for an index with fixed length keys or EIPSminleaf for an index with variable length keys.

  6. Calculate the number of leaf pages, using the the values of NUMBER_KEYS from step 2 and #KEYSleaf from step 5:
       LEAF PAGES = TRUNC [1 + (NUMBER_KEYS/#KEYSleaf)]
    
  7. Calculate the length of a nonleaf page entry with the value of KEYLEN from step 3b.
       PAIRLEN = KEYLEN + 3
                        + 4 (if index is not unique)
    
  8. Use the value of PAIRLEN from step 7 to calculate the number of keys that can be put on a nonleaf page.
       #KEYSnonleaf = TRUNC [EIPS/PAIRLEN]
    

    where EIPS is EIPSmax for an index with fixed length keys or EIPSmin-nonleaf for an index with variable length keys.

  9. Calculate the number of nonleaf pages required at each level, using the value of LEAF PAGES from step 6.
       level = 1
       NONLEAF PAGES(level)=
            TRUNC [1 + (LEAF PAGES/KEYSnonleaf)]
    

    While the number of nonleaf pages at the current level is greater than 1, do the following:

       level = level + 1
       NONLEAF PAGES(level) =
            TRUNC [1 + (NONLEAF PAGES(level-1)/KEYSnonleaf)]
     
    
  10. Calculate the total number of index pages by adding the LEAF PAGES value from step 6 and the nonleaf pages for every level as calculated in step 9.
       INDEX PAGES = LEAF PAGES + NONLEAF PAGES(1) + ... + NONLEAF PAGES(n)
    

Estimating Internal Dbspace Size and DASD Needs for Sort Operations

Internal dbspaces are most commonly used as work areas for sorting data. It is helpful to predict the amount of space needed to perform a sort, in order to estimate how big your internal dbspaces should be.

This section will discuss how much space is required to perform a particular sort. Since multiple users can be performing a sort concurrently, it is more difficult to determine the maximum internal dbspace requirements for your database than for a given sort. This maximum depends both on the expected size of a sort, as well as how many sorts are expected to be occurring concurrently. You must also consider non-sort usage of internal dbspaces, such as to contain materialized views or intermediate query results. Refer to the DB2 Server for VM System Administration or DB2 Server for VSE System Administration manual for more information about internal dbspace usage, including guidelines for determining the number and size of internal dbspaces for your database.

The size of internal dbspaces in a database is often dictated by the largest sort operation possible in that database, such as the sort needed to create an index on the largest table in the database.

When Do We Sort?

Sorting is performed whenever an operation requires that data be ordered or that duplicate values be eliminated, and no appropriate index exists that provides the necessary ordering. Even if an appropriate index exists, the Optimizer may decide not to use it.

In most cases, it is readily apparent where a sort can occur. The following is a list of all cases:

You can use the EXPLAIN command if you are unsure whether or not a particular query performs a sort. If you query performs more than one of the above, then it may perform multiple sorts. If you UNION or join more than two tables, another sort may be performed for each additional table, since we UNION and join tables two at a time.

For further information on sorting, refer to the DB2 Server for VSE & VM Diagnosis Guide and Reference manual.

Internal Dbspace Characteristics

The characteristics of an internal dbspace are different from those of a permanent dbspace:

There are no free space classes for internal dbspace pages, since data is always added at the end, and hence there is never a need to search for free space in which to store a row. This avoids the space wastage which can occur due to the granularity of free space classes (a row will be stored on a page in an internal dbspace whenever the page has enough free bytes to hold the row).

You can see that internal dbspaces are much simpler than permanent dbspaces. Calculating the number of pages needed to hold a certain amount of data is also simpler.

Calculating Internal Dbspace Size Requirements

We will calculate the amount of space required to hold a copy of the working set of data during a sort. Specifically, we will calculate the size of the initial working set, since the working set can only get smaller due to the elimination of duplicate values. In building the initial working set, we retrieve a portion of the input data (enough to fill an internal sort buffer), sort it, and write the sorted portion to an internal dbspace. Duplicates may be eliminated during the sort. We will not consider the effects of duplicate elimination in these calculations, since these effects are dependent on the order in which data is encountered.

The following steps calculate the size of a sort row. The sort row is made up mostly of the columns by which we are ordering, that is the sort key, plus any other columns which must appear in the result.

  1. Calculate the average length of a sort key.
    1. Calculate the average encoded length of each variable length ordering column (in bytes). The average length should not include trailing blanks (if any) since these blanks are not stored in the sort key.
         VARCOLavg(n) = (1.25 x average length of varying-length
                        ordering column n) + 3
      

      These numbers must be rounded up to integer values.

      The encoding of varying-length values incurs an overhead of approximately 25 percent.

    2. Calculate the average length of a sort key.
         SORTKEYLEN = the sum of the lengths of fixed-length ordering columns
                        (in bytes)
                      + VARCOLavg(1) + ... + VARCOLavg(n)
                      + 1 for each ordering column that allows nulls
      

      For index creation, the TID of the data row is part of the sort key/row. If the sort is for index creation:

         SORTKEYLEN = SORTKEYLEN + 4
      
  2. Calculate the average length of a sort row. We add overhead for the sort row header, plus add any non-ordering columns which must appear in the result. There are no non-ordering columns for index creation. Non-ordering columns are not encoded.
       SORTROWLEN = SORTKEYLEN
                    + 3 bytes (sort row header)
                    + the sum of the lengths of fixed-length non-ordering columns
                      (in bytes)
                    + the sum of the average lengths of varying-length non-ordering
                      columns (in bytes)
                    + 1 for each non-ordering column which allows nulls
    

    For cases other than index creation, where the sort key contains at least one varying-length column, there will be the following additional overhead:

  3. Adjust for varying-length data.

    For varying length sort rows, the order in which rows are encountered and stored can affect the number of pages required. To account for this possibility, we can use a method similar to the Effective Index Page Size used in calculating the size of an index. Briefly, this method models the worst case where the last sort row we try to insert into a page is the largest possible sort row, and the page has one fewer bytes of free space available. This gives us our maximum space wastage per page, and will yield the upper bound on the number of pages we will use. To determine the Effective Internal Dbspace Page Size (EIDPS), do the following:

    1. For fixed-length data
         EIDPS = 4080
      
    2. For varying-length data, repeat the previous calculations to determine SORTROWLEN, substituting the maximum length of varying-length columns for the average length. This gives us MAX SORTROWLEN.
         EIDPS = 4080 - (MAX_SORTROWLEN + 1)
      

The following steps will calculate the number of pages required to hold a copy of all sort rows. This is the minimum size of internal dbspace that is required to perform the sort.

  1. First calculate how many rows will fit on a page.
       ROWS_PER_PAGE = TRUNC [EIDPS/SORTROWLEN]
    
  2. Determine the number of sort rows.

    For index creation, the number of sort rows is the same as the number of rows in the table. For cases where only a subset of the rows in a table participates in a sort, then the number of participating rows must be estimated based on your knowledge of the query and the contents of the table.

       NROWS = number of rows expected to participate in the sort
    
  3. Compensate for effect of sort buffering

    A block of input rows is encoded and stored in a sort buffer. The contents of this buffer are then sorted and written out to pages of an internal dbspace. Since the buffer is large enough to fill several internal dbspace pages, and the space in the buffer is contiguous while each internal dbspace page has a header, then we must account for this in determining the number of pages required.

    1. Calculate how many rows are in the block of rows that would fill the sort buffer.
         ROWS_PER_BLOCK = TRUNC [40948 / SORTROWLEN]
      
    2. Calculate how many internal dbspace pages would be filled by a block of rows.
         PAGES_PER_BLOCK = ROWS_PER_BLOCK / ROWS_PER_PAGE
      

      This number must be rounded up to an integer value.

    3. Calculate how many full blocks the expected number of sort rows would generate.
         FULL_BLOCKS = TRUNC [NROWS / ROWS_PER_BLOCK]
      
    4. Calculate how many rows would be in the last (not full) block.
         ROWS_LAST_BLOCK = NROWS - (ROWS_PER_BLOCK x FULL_BLOCKS)
      
  4. Finally, using all the information we have derived so far, calculate the number of pages required. We add one more page to account for the header page of the internal dbspace.
       NPAGES = (FULL_BLOCKS x PAGES_PER_BLOCK)
                + (ROWS_LAST_BLOCK / ROWS_PER_PAGE)    < round up
                + 1
    

    For a sort to be successful, the internal dbspace must be defined to have at least NPAGES pages.

Calculating Total Internal Dbspace and DASD Needs

So far we have calculated the size of the sort working set. After the initial working set has been created, we then merge all the sorted portions to yield a final sorted result. Multiple merge passes may occur before the final result is created. During this process, two copies of the working set exist, in two internal dbspaces. For successful completion of a sort, more than one internal dbspace must be available.

The final result may be smaller than intermediate results, due to such things as the elimination of the three byte sort row header, and the decoding of varying-length values in cases other than index creation. The amount of DASD required is reduced only in the case where a single merge pass is performed; that is, when one pass is made through the data from the initial working set to the final result. This only occurs on sorts which are sufficiently small, or where the data is already almost completely sorted.

When sorting for duplicate elimination, the merge process will remove duplicates. As with the duplicate elimination which occurred during sorting, it is difficult to predict the effect this will have. Note that, for calculating DASD requirements, we are only interested in the duplicate elimination which would occur during the first merge pass, since the second copy of the working set is created by this pass. The completion of the first merge pass is the point at which our peak DASD usage occurs.

We will not consider these cases, and calculate the amount of DASD required to perform the sort as:

   number of DASD pages = NPAGES x 2

For the sort to complete successfully, the storage pool to which the internal dbspaces are assigned must have sufficient DASD pages available.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]