This appendix describes procedures and calculations you can follow to determine the amount of storage to allocate to your dbspaces. You must determine:
You need to estimate data storage requirements to establish dbspace sizes. The required size of a dbspace depends on:
Using the above estimated values, you can calculate the required size of the dbspace by determining and adding the number of pages required for:
Refer to "Estimating Storage for a Table". For each table the storage requirement is further described in:
Setting allowances and using defaults rather than estimating the number of header and index pages is discussed in "General Guidelines".
The following formula shows how these values are used to calculate the number of dbspace pages needed for a set of tables:
DBSPACE PAGES = HEADER PAGES + DATA PAGES + INDEX PAGES + ALLOWANCE |
For most dbspaces, it is sufficient to use the default value of 8 for the number of HEADER PAGES. You should also use the default of 33 percent for PCTINDEX, rather than estimate the number of index pages needed, unless you anticipate doing extensive indexing. This default reserves approximately one third of the total space for indexes. If you assume both defaults, 8 HEADER PAGES and 33 percent for PCINDEX, the above formula becomes:
DBSPACE PAGES = 8 + 1.50 x (DATA PAGES + ALLOWANCE)
The DBSPACE PAGES number derived must be rounded up to a multiple of 128. That is,
REQUIRED DBSPACE PAGES = TRUNC [ (DBSPACE PAGES + 127 ) / 128 ] x 128
The TRUNC function, for truncate operation, indicated here means to compute the value between the brackets [ ] and then use only the integer part of that value. For example, if the value calculated is 27.8, use 27.
You should allow from 50 to 200 percent for ALLOWANCE, depending on the nature of the tables to be stored. If the number of rows are relatively stable and you do not anticipate adding columns to tables (or even adding tables), adding an ALLOWANCE of 50 percent is safe. To allow all forms of growth (inserting rows, adding columns, and adding tables), you should consider an ALLOWANCE of 200 percent (2 x data pages).
Notes:
To estimate the amount of storage required for a table, consider:
Note: | For tables with variable length rows, data page requirements will depend upon the placement of the rows of different length on the pages. Some orderings will require more data pages than others. When estimating storage for these tables, refer to Estimating Data Pages for a Table with Variable Length Rows. |
There are no guidelines for estimating the number of rows your table will have. However, for the purpose of determining the dbspace requirements, it is wise to look ahead to potential growth of the table in the foreseeable future. Consider the estimated size of the table 2 or 3 years from now, rather than its current size.
The length of a stored row can be estimated using Table 28. To complete the calculations of Table 28, you must know the type and length of all
columns in your tables. If long-field columns are involved, you
should first calculate the average length of long-field columns using Table 29.
Table 28. Form for Calculating the Average Row Length of a Stored Row
COLUMN OVERHEAD The number of columns supporting nulls The number of VARCHAR(n) columns with n<=254 The number of VARGRAPHIC(n) columns with n<=127 The number (N) of LONG FIELDS1 N x 6 SUM OF COLUMN OVERHEAD FACTORS |
___ ___ ___ ___ ___ |
COLUMN DATA STORAGE FACTORS2 INTEGER: 4 SMALLINT: 2 DECIMAL: TRUNC [PRECISION/2 + 1] FLOAT: 8 3 FLOAT: 4 3 CHAR(n): n GRAPHIC(n): n x 2 4 DATE: 4 TIME: 3 TIMESTAMP: 10 VARCHAR(n) n<=254: average length VARGRAPHIC(n) n<=127: average length x 2 4 LONG FIELDS : calculated separately (See Table 29.) SUM OF COLUMN DATA STORAGE FACTORS |
___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ |
ROW OVERHEAD FACTOR |
8 |
AVERAGE LENGTH OF EACH STORED ROW (AVGROWLEN) |
___ |
Notes:
|
Column Overhead refers to descriptive information stored with an instance of the column. The overhead depends upon the characteristics of the column, as follows:
Pointers to the long-field values are stored in a special internal format that involves 6 bytes of control information in the stored row (2-byte length value and 4-byte tuple identifier (TID)).
Column Data Storage refers to the storage space occupied by the actual column values. The numbers shown are number of bytes.
Row Overhead is a
fixed overhead for each row in the table. It consists of a 6-byte row
header and a 2-byte offset into the page, for a total of 8 bytes.
Table 29. Formula for Calculating the Average Length of a Long-Field Column
The value of a LONG FIELD is stored separately from the rest of the stored row. The value is stored as a chain of entries in an internal table. Each entry of the internal table is composed of 16 columns of 250 bytes each (some potentially null). Each record has 20 bytes of overhead (a 2-byte offset, a 6-byte row header, and a 12-byte unary link pointer chain). Thus, the overhead for a LONG-FIELD value depends on the actual length of the data and includes 20 bytes for each 4000-byte (16 columns of 250 bytes) entry required to store the value.
Long-field values are stored in increments of 250 bytes. Each increment is one fixed-length 250-byte column value in the internal table. For example a 10-byte long-field value occupies 250 bytes of storage, plus the long-field-value overhead of 20 bytes. A 248-byte long-field value occupies 250 bytes, a 260-byte long-field value occupies 500 bytes storage, and so on.
The number of header pages for a dbspace can be established on the SQL ACQUIRE DBSPACE statement. In general, you should use the default value of 8 for this option.
A more precise estimate of the number of header pages follows. It is more complex than the general guidelines above, but will assist you in your calculations if you require a better estimation.
The header pages contain information of the objects defined in a dbspace. Each object defined in the dbspace, such as a table or an index, is recorded in the header pages via a control row. For more information on the types of objects that can be defined in a dbspace and the types of control rows that are inserted in the header pages, see the DB2 Server for VSE & VM Diagnosis Guide and Reference manual.
To estimate the number of header pages required:
For example, assume you are planning to acquire a dbspace that will contain 3 tables. Table A has 10 columns, 2 indexes each defined on a single column, and no long-field columns. Table B has 14 columns; 1 index containing 3 columns, and no long-field columns. Table C contains 3 columns, 1 index containing 1 column, and 2 long-field columns. The estimated number of header pages for this dbspace is as follows:
DBSPACE: 24 Table A: 32 + (2 x 10) < table 20 + (2 x 1) < index 1 20 + (2 x 1) < index 2 Table B: 32 + (2 x 14) < table 20 + (2 x 3) < index Table C: 32 + (2 x 3) < table 20 + (2 x 1) < index 84 < long-field columns ============= Total: 350 bytes Divide by 4080 1 header page required.
The number of data pages required to store a table depends on whether the rows in the table are of fixed or variable length. The next section describes a method for calculating the pages required for storing tables with fixed length rows. For tables with variable length rows (rows with VARCHAR or VARGRAPHIC data), refer to "Estimating Data Pages for a Table with Variable Length Rows".
Note: | Long-field columns do not produce variable length rows because the long-field values are stored separately. |
The number of data pages required to hold the tables can be estimated after determining the average row lengths (AVGROWLEN) for each table and the effective page size (EPS) based on PCTFREE setting at the time the pages are to be loaded.
The number of data pages required is estimated as follows:
40 x PCTFREE + AVGROWLEN
Table 30. Effective Page Size Based on Free Space Requirement
40 x PCTFREE + AVGROWLEN | EPS (Effective Page Size) | ||
---|---|---|---|
8-17 | 4065 + AVGROWLEN | ||
18-32 | 4050 + AVGROWLEN | ||
33-52 | 4030 + AVGROWLEN | ||
53-102 | 3980 + AVGROWLEN | ||
103-252 | 3830 + AVGROWLEN | ||
253-502 | 3580 + AVGROWLEN | ||
503-1002 | 3080 + AVGROWLEN | ||
1003-2002 | 2080 + AVGROWLEN | ||
2003-4020 | 62 + AVGROWLEN | ||
4021-4080 | 2 + AVGROWLEN | ||
4081- | See note below. | ||
|
Rows per Page = MINIMUM (256, TRUNC [ EPS/AVGROWLEN ] )
If the average long-field length is <= 4020, then:
REQUIRED Number of Rows Number of Rows x Number of Long Fields DATA = -------------- + -------------------------------------- PAGES Rows per Page 4020 / Average Long-Field Length
Note: | When evaluating the expression, truncate the denominator 4020/average long-field length to the nearest integer and round up the results of both division expressions to the nearest integer before adding them. |
If the average long-field length is >4020:
REQUIRED Number of Rows DATA = -------------- + Number of Rows x Number of Long Fields PAGES Rows per Page Average Long-Field Length x ------------------------- 4020
Note: | Round the results of both division terms up to the nearest integer before evaluating the expression. |
If you are loading tables separately, calculate the number of pages required for each table separately. If you are loading the tables in an interleaved fashion, use the longest AVGROWLEN of all the tables in determining the Effective Page Size.
Notes:
The example work sheet shown in Table 31 is for a table that has just one CHAR(100) column
supporting nulls.
Table 31. Example 1 -- Calculating the Average Row Length
COLUMN OVERHEAD The number of columns supporting nulls The number of VARCHAR(n) columns with n<=254 The number of VARGRAPHIC(n) columns with n<=127 The number (N) of Long Fields * N x 6
|
1 0 0 0 1 |
COLUMN DATA STORAGE FACTORS * INTEGER: 4 * SMALLINT: 2 * DECIMAL: TRUNC [PRECISION/2 + 1] * FLOAT: 8 (for double-precision) * FLOAT: 4 (for single-precision) * CHAR(n): n * GRAPHIC(n): n x 2 * DATE: 4 * TIME: 3 * TIMESTAMP: 10 * VARCHAR(n): average length * VARGRAPHIC(n): average length x 2 * Long Fields: calculated separately (See Table 29.)
|
0 0 0 0 0 100 0 0 0 10 0 0 0 100 |
ROW OVERHEAD FACTOR |
8 |
AVERAGE LENGTH OF EACH STORED ROW |
109 |
The number of DATA PAGES required to load 25000 rows into this table in a dbspace defined to have 10 % free space is:
AVGROWLEN = 109
PCTFREE = 10
40 x 10 + 109 = 509
503 - 1002 ..... 3080 + AVGROWLEN
EPS = 3080 + 109 = 3189
Rows per Page = MINIMUM (256, TRUNC [ 3189/109 ] ) = 29
Number of Rows Number of Rows x Number of Long Fields -------------- + -------------------------------------- = 863 Rows per Page 4020 / Average Long-Field Length
The example work sheet shown in Table 32 is for a table that has:
Table 32. Example 2 -- Calculating the Average Row Length of a Stored Row
COLUMN OVERHEAD The number of columns supporting nulls The number of VARCHAR(n) columns with n<=254 The number of VARGRAPHIC(n) columns with n<=127 The number (N) of Long Fields * N x 6 SUM OF COLUMN OVERHEAD FACTORS |
11 4 1 0 16 |
COLUMN DATA STORAGE FACTORS * INTEGER: 4 * SMALLINT: 2 * DECIMAL: TRUNC [PRECISION/2 + 1]
|
4 2 13 0 0 7 20 4 3 0 60 24 137 |
ROW OVERHEAD FACTOR |
8 |
AVERAGE LENGTH OF EACH STORED ROW |
161 |
The number of DATA PAGES required to load 600 rows into this table in a dbspace defined to have 15 percent free space is:
AVGROWLEN = 161
PCTFREE = 15
40 x 15 + 161 = 761
503 - 1002 ..... 3080 + AVGROWLEN
EPS = 3080 + 161 = 3241
Rows per Page = MINIMUM (256, TRUNC [ 3241/161 ] ) = 20
Number of Rows Number of Rows x Number of Long Fields -------------- + -------------------------------------- = 30 Rows per Page 4020 / Average Long-Field Length
Usually, you store a table this small in a dbspace with other tables. If a dbspace has more than one table, the total number of DATA PAGES required for the dbspace is the sum of the data page requirements of all the tables in the dbspace.
The example work sheets shown in Table 33 and Table 34 are for a table that has:
Table 33. Example 3 -- Calculating the Average Row Length of a Stored Row
COLUMN OVERHEAD The number of columns supporting nulls The number of VARCHAR(n) columns with n<=254 The number of VARGRAPHIC(n) columns with n<=127 The number (N) of Long Fields * N x 6 SUM OF COLUMN OVERHEAD FACTORS |
12 3 1 6 22 |
COLUMN DATA STORAGE FACTORS * INTEGER: 4 * SMALLINT: 2 * DECIMAL: TRUNC [PRECISION/2 + 1] * FLOAT: 8 (for double-precision) * FLOAT: 4 (for single-precision) * CHAR(n): n * GRAPHIC(n): n x 2 * DATE: 4 * TIME: 3 * TIMESTAMP: 10 * VARCHAR(n): average length * VARGRAPHIC(n): average length x 2 * Long Fields: calculated separately (See Table 34.) SUM OF COLUMN DATA STORAGE FACTORS |
4 2 13 0 0 7 20 8 0 10 28 24 116 |
ROW OVERHEAD FACTOR |
8 |
AVERAGE LENGTH OF EACH STORED ROW |
146 |
Table 34. Example 3 -- Calculating the Average LONG VARCHAR Stored Length
LONG VARCHAR VALUE OVERHEAD The number(N) of LONG VARCHAR columns * N x (TRUNC [ (average length + 3999) / 4000] x 20) |
20 |
LONG VARCHAR VALUE STORAGE TRUNC [ (average length + 249) / 250 ] x 250 |
250 |
AVERAGE LENGTH OF EACH STORED LONG VARCHAR |
270 |
The number of DATA PAGES required to load 25000 rows into this table in a dbspace defined to have 10 percent free space is:
AVGROWLEN = 146
PCTFREE = 10
40 x 10 + 146 = 546
503 - 1002 ..... 3080 + AVGROWLEN
EPS = 3080 + 146 = 3226
Rows per Page = MINIMUM (256, TRUNC [ 3226/146 ] ) = 22
Number of Rows Number of Rows x Number of Long Fields -------------- + -------------------------------------- = 2923 Rows per Page 4020 / Average Long-Field Length
The following methods provide estimates for tables containing variable length data with data types VARCHAR and VARGRAPHIC. Tables with columns containing variable length data types result in rows of differing lengths that can be distributed throughout a dbspace in different ways depending upon the order in which data is loaded. The distribution of the variable length rows in the dbspace can significantly affect the number of data pages occupied by a table.
There are three different methods you can use to more accurately estimate the data page requirements for tables with variable length rows:
Establish a database before you begin modeling your data page requirements. Then, do the following:
Consider the following when modeling your data page requirements in this way:
This method is the safest method to use; it will ensure that you have enough pages regardless of the distribution of the rows in the table. However, it may overestimate your requirements.
To use this method you need to know the:
Then do the following:
AVGROWLEN x Number of Rows Worst Case = MINIMUM ( Number of rows, -------------------------- ) EPSmax - MAXROWLEN + 1
Example using the Worst Case Method:
Consider a 500,000 row table being loaded into a dbspace with PCTFREE=10. Assume the overall AVGROWLEN value is 50 bytes. Assume the calculated MAXROWLEN value is 110 bytes for this table.
Calculate the EPSmax value as follows:
40 x PCTFREE + MAXROWLEN = 40 x 10 + 110 = 510
The corresponding EPSmax is 3190.
Substitute the values in the worst case formula:
50 x 500000 MINIMUM (500000, --------------) = MINIMUM (500000, 8114.2) 3190 - 110 + 1
To store this table you need at most 8115 data pages.
This method assumes that you can approximate the frequency of different ROWLENGTHs in the table to be stored. This method is as follows:
Try several different groupings of rows, making sure that each group is large enough to cover several data pages. If groups of rows do not cover several data pages, the estimate could be too high.
Different groupings will give different results. Select the highest overall page estimate to ensure that your estimate includes a contingency.
Example Using the Splitting Method:
Consider a 2000 row table to be loaded into a dbspace with PCTFREE=0. Assume the overall AVGROWLEN value to be 1000 bytes. Assume, also, that 25 percent of the rows in the table are longer than 800 bytes, with an AVGROWLEN value = 2500 bytes. The remaining 75 percent of the rows are less than 800 bytes long, with an AVGROWLEN value = 500 bytes.
We consider two groups of rows for this calculation:
Perform the calculations described in Estimating the Number of Data Pages treating each group as a table.
Adding these two page requirements together gives an overall estimate of 688 data pages (188 + 500) to store the whole 2000 row table.
Compare this to the result if you used the formula for fixed length rows. Using only the formula described in Pages Required for Storing Tables with Fixed Length Rows and the overall AVGROWLEN of 1000, the EPS is 4080. Therefore we can fit 4 rows per page (4080/1000 = 4.08) and we need 500 pages (2000/4 = 500) to store all 2000 rows. This is considerably less than the 688 pages estimated. The real number of data pages required is likely between 500 and 688 depending on the order in which the rows are being stored in the dbspace.
Generally speaking, you should take the default allocation for index pages in the dbspace (PCTINDEX=33). This is means that the number of index pages is approximately DATA PAGES / 2. This leaves you considerable freedom to vary the indexing you do on the tables in the dbspace. Another way of looking at this is that if the number of index pages is more than half the number of data pages, you may be trying to support too many indexes on the tables in the dbspace. As a result, you may experience performance problems on INSERT, UPDATE, and DELETE operations.
However, if the data in the dbspace is largely used for read only operations, you may want to create a lot of different indexes. If this is the case, you may need more index pages than is provided for by the default PCTINDEX value of 33 percent. For such read only (or read mostly) cases, you may want to do the detailed analysis of index page requirements to determine the appropriate PCTINDEX value based on the size of indexes you plan on supporting.
If the data in the dbspace is to have very few indexes with rather small key lengths, then you may want to do the detailed analysis of index page requirements to determine an appropriate PCTINDEX value that is smaller than the default.
The formula for calculating the appropriate PCTINDEX value is:
TOTAL INDEX PAGES PCTINDEX = ----------------------------------------------- HEADER PAGES + DATA PAGES + TOTAL INDEX PAGES
In this formula, TOTAL INDEX PAGES is the sum of the number of index pages required for each planned index.
The next section provides the guidelines for estimating the number of index pages required for an index.
Index storage is allocated in pages. Each page contains data for only one index. The minimum storage required for any index is one page.
To estimate the amount of storage required for an index, you must consider the type of information in the index key and the amount of information in the table being indexed. The following table information must be considered for calculating the size of an index:
For indexes that are not unique, this result may be less than the total number of rows in the table. Each entry in a leaf page of the index consists of a key value and one or more row pointers, called Tuple Identifiers or TIDs, for the row having this key value.
For unique indexes, each entry contains just one TID.
These entries are called clusters.
For fixed length unique key indexes, the following calculations for index size will be accurate. For variable length or non-unique indexes, the calculations may either overestimate or underestimate the size of an index.
Generally, the size may be overestimated if:
The size may be underestimated if the varying length keys are not evenly distributed. For example, if the ordering of the keys in the index is from shortest to longest, then the lengths are not evenly distributed and space will be underestimated.
To calculate the size of the index perform the following steps:
The Effective Index Page Size (EIPS) is similar to the effective page size calculated for data pages. The formula for index pages differs for fixed length and variable length index keys.
For fixed length index keys:
EIPSmax = 4056 - (41 x PCTFREE)
For variable length index keys:
For a short VARCHAR column, if it is the last column in the key,
VARCOL(n) = maximum length of column
Otherwise,
VARCOL(n) = (INTEGER((max length of column - 1) / 4) + 1) * 5
For a short VARGRAPHIC column, if it is the last column in the key,
VARCOL(n) = 2*(maximum length of column)
Otherwise,
VARCOL(n) = (INTEGER((2*max length of column-1)/4)+1) * 5
KEYLENmax = the sum of the lengths of fixed columns (in bytes) + VARCOL(1) + ... + VARCOL(n) + 1 for the length of the key + 1 for each column that allows nulls
CLUSTERmax = KEYLENmax + 1 for number of TIDs + 4 for the TID
EIPSminleaf = 4056 - (41 x PCTFREE) - CLUSTERmax + 1
PAIRLENmax = KEYLENmax + 3 for the page number + 4 (if index is not unique)
EIPSmin-nonleaf = 4056 - (41 x PCTFREE) - PAIRLENmax + 1
The average number of rows identified in one cluster is:
NUMBER_KEYS = Number of distinct keys ROWSPERCLUSTER = Number of rows in table ----------------------- NUMBER_KEYS
If ROWSPERCLUSTER is greater than 255, then the key must be duplicated. In this case, the following calculations must be done:
NUMBER_KEYS = (TRUNC [1 + (ROWSPERCLUSTER/255) ]) x NUMBER_KEYS Number of rows in table ROWSPERCLUSTER = ----------------------- NUMBER_KEYS
VARCOLavg(n) = (1.25 x average length of column) + 3
These numbers must be rounded up to integer values.
Once again, when determining the length of graphic data, allow 2 bytes for each character.
KEYLEN = the sum of the lengths of fixed columns (in bytes) + VARCOLavg(1) + ... + VARCOLavg(n) + 1 if there are any variable-length columns + 1 for each column that allows nulls
CLUSTERSIZE = 1 + KEYLEN + (4 x ROWSPERCLUSTER)
#KEYSleaf = TRUNC [EIPS/CLUSTERSIZE]
where EIPS is EIPSmax for an index with fixed length keys or EIPSminleaf for an index with variable length keys.
LEAF PAGES = TRUNC [1 + (NUMBER_KEYS/#KEYSleaf)]
PAIRLEN = KEYLEN + 3 + 4 (if index is not unique)
#KEYSnonleaf = TRUNC [EIPS/PAIRLEN]
where EIPS is EIPSmax for an index with fixed length keys or EIPSmin-nonleaf for an index with variable length keys.
level = 1 NONLEAF PAGES(level)= TRUNC [1 + (LEAF PAGES/KEYSnonleaf)]
While the number of nonleaf pages at the current level is greater than 1, do the following:
level = level + 1 NONLEAF PAGES(level) = TRUNC [1 + (NONLEAF PAGES(level-1)/KEYSnonleaf)]
INDEX PAGES = LEAF PAGES + NONLEAF PAGES(1) + ... + NONLEAF PAGES(n)
Internal dbspaces are most commonly used as work areas for sorting data. It is helpful to predict the amount of space needed to perform a sort, in order to estimate how big your internal dbspaces should be.
This section will discuss how much space is required to perform a particular sort. Since multiple users can be performing a sort concurrently, it is more difficult to determine the maximum internal dbspace requirements for your database than for a given sort. This maximum depends both on the expected size of a sort, as well as how many sorts are expected to be occurring concurrently. You must also consider non-sort usage of internal dbspaces, such as to contain materialized views or intermediate query results. Refer to the DB2 Server for VM System Administration or DB2 Server for VSE System Administration manual for more information about internal dbspace usage, including guidelines for determining the number and size of internal dbspaces for your database.
The size of internal dbspaces in a database is often dictated by the largest sort operation possible in that database, such as the sort needed to create an index on the largest table in the database.
Sorting is performed whenever an operation requires that data be ordered or that duplicate values be eliminated, and no appropriate index exists that provides the necessary ordering. Even if an appropriate index exists, the Optimizer may decide not to use it.
In most cases, it is readily apparent where a sort can occur. The following is a list of all cases:
You can use the EXPLAIN command if you are unsure whether or not a particular query performs a sort. If you query performs more than one of the above, then it may perform multiple sorts. If you UNION or join more than two tables, another sort may be performed for each additional table, since we UNION and join tables two at a time.
For further information on sorting, refer to the DB2 Server for VSE & VM Diagnosis Guide and Reference manual.
The characteristics of an internal dbspace are different from those of a permanent dbspace:
There are no free space classes for internal dbspace pages, since data is always added at the end, and hence there is never a need to search for free space in which to store a row. This avoids the space wastage which can occur due to the granularity of free space classes (a row will be stored on a page in an internal dbspace whenever the page has enough free bytes to hold the row).
You can see that internal dbspaces are much simpler than permanent dbspaces. Calculating the number of pages needed to hold a certain amount of data is also simpler.
We will calculate the amount of space required to hold a copy of the working set of data during a sort. Specifically, we will calculate the size of the initial working set, since the working set can only get smaller due to the elimination of duplicate values. In building the initial working set, we retrieve a portion of the input data (enough to fill an internal sort buffer), sort it, and write the sorted portion to an internal dbspace. Duplicates may be eliminated during the sort. We will not consider the effects of duplicate elimination in these calculations, since these effects are dependent on the order in which data is encountered.
The following steps calculate the size of a sort row. The sort row is made up mostly of the columns by which we are ordering, that is the sort key, plus any other columns which must appear in the result.
VARCOLavg(n) = (1.25 x average length of varying-length ordering column n) + 3
These numbers must be rounded up to integer values.
The encoding of varying-length values incurs an overhead of approximately 25 percent.
SORTKEYLEN = the sum of the lengths of fixed-length ordering columns (in bytes) + VARCOLavg(1) + ... + VARCOLavg(n) + 1 for each ordering column that allows nulls
For index creation, the TID of the data row is part of the sort key/row. If the sort is for index creation:
SORTKEYLEN = SORTKEYLEN + 4
SORTROWLEN = SORTKEYLEN + 3 bytes (sort row header) + the sum of the lengths of fixed-length non-ordering columns (in bytes) + the sum of the average lengths of varying-length non-ordering columns (in bytes) + 1 for each non-ordering column which allows nulls
For cases other than index creation, where the sort key contains at least one varying-length column, there will be the following additional overhead:
SORTROWLEN = SORTROWLEN + 1
SORTROWLEN = SORTROWLEN + number of varying-length sort key columns
If the data does not contain trailing blanks, then this overhead is not incurred.
For varying length sort rows, the order in which rows are encountered and stored can affect the number of pages required. To account for this possibility, we can use a method similar to the Effective Index Page Size used in calculating the size of an index. Briefly, this method models the worst case where the last sort row we try to insert into a page is the largest possible sort row, and the page has one fewer bytes of free space available. This gives us our maximum space wastage per page, and will yield the upper bound on the number of pages we will use. To determine the Effective Internal Dbspace Page Size (EIDPS), do the following:
EIDPS = 4080
EIDPS = 4080 - (MAX_SORTROWLEN + 1)
The following steps will calculate the number of pages required to hold a copy of all sort rows. This is the minimum size of internal dbspace that is required to perform the sort.
ROWS_PER_PAGE = TRUNC [EIDPS/SORTROWLEN]
For index creation, the number of sort rows is the same as the number of rows in the table. For cases where only a subset of the rows in a table participates in a sort, then the number of participating rows must be estimated based on your knowledge of the query and the contents of the table.
NROWS = number of rows expected to participate in the sort
A block of input rows is encoded and stored in a sort buffer. The contents of this buffer are then sorted and written out to pages of an internal dbspace. Since the buffer is large enough to fill several internal dbspace pages, and the space in the buffer is contiguous while each internal dbspace page has a header, then we must account for this in determining the number of pages required.
ROWS_PER_BLOCK = TRUNC [40948 / SORTROWLEN]
PAGES_PER_BLOCK = ROWS_PER_BLOCK / ROWS_PER_PAGE
This number must be rounded up to an integer value.
FULL_BLOCKS = TRUNC [NROWS / ROWS_PER_BLOCK]
ROWS_LAST_BLOCK = NROWS - (ROWS_PER_BLOCK x FULL_BLOCKS)
NPAGES = (FULL_BLOCKS x PAGES_PER_BLOCK) + (ROWS_LAST_BLOCK / ROWS_PER_PAGE) < round up + 1
For a sort to be successful, the internal dbspace must be defined to have at least NPAGES pages.
So far we have calculated the size of the sort working set. After the initial working set has been created, we then merge all the sorted portions to yield a final sorted result. Multiple merge passes may occur before the final result is created. During this process, two copies of the working set exist, in two internal dbspaces. For successful completion of a sort, more than one internal dbspace must be available.
The final result may be smaller than intermediate results, due to such things as the elimination of the three byte sort row header, and the decoding of varying-length values in cases other than index creation. The amount of DASD required is reduced only in the case where a single merge pass is performed; that is, when one pass is made through the data from the initial working set to the final result. This only occurs on sorts which are sufficiently small, or where the data is already almost completely sorted.
When sorting for duplicate elimination, the merge process will remove duplicates. As with the duplicate elimination which occurred during sorting, it is difficult to predict the effect this will have. Note that, for calculating DASD requirements, we are only interested in the duplicate elimination which would occur during the first merge pass, since the second copy of the working set is created by this pass. The completion of the first merge pass is the point at which our peak DASD usage occurs.
We will not consider these cases, and calculate the amount of DASD required to perform the sort as:
number of DASD pages = NPAGES x 2
For the sort to complete successfully, the storage pool to which the internal dbspaces are assigned must have sufficient DASD pages available.