 |
 |
|
|
|
Size the data warehouse
Use the following information to plan for sufficient storage capacity
for the Process Analyzer
Engine
to use.
Method for sizing the database
The Process Analyzer
uses three separate data storage entities: the data warehouse, the data
mart, and the OLAP cubes.
Data warehouse |
The set of data that are collected and stored in the data warehouse
are the workflow events, process, configuration, and resource data.
The configuration, process, and resource do not consume much spacethey
reach a plateau then the volume of new data increases at a very
slow rate. On the other hand, the events generate a large volume
of data. An event is generated when a workflow is created or terminated,
and multiple events are generated during the processing of the steps
of the workflow.
The majority of the generated events are information about workflow
step activities. For instance, over the life cycle of a step, the
Process Engine
generates a minimum of three events: one event to queue the work
object at a particular step, one to lock the work object, and one
to dispatch the work object after it has been completed. More events
may be generated, depending on the number of times that a work object
is locked, unlocked, reassigned, or delegated. Since the events
generate the most significant amount of data, the best way to estimate
the size the database is based on the expected number of processed
workflow stepsthe number of processed workflow steps translates
relatively well to the number of events that are generated and stored
in the data warehouse.
The Process Analyzer
data warehouse is named VMAEDW.
|
Data mart |
As the VMAE Publishing Service processes the data warehouse, it
generates and stores facts in the various fact tables in the data
mart. The Publishing service also imports process, configuration,
and user information from the data warehouse into the data mart. The
majority of the data that are stored in the data mart are facts. This
is because facts are generated from the events and, as mentioned in
the previous section, the events are the majority of the data that
are collected in the data warehouse. Since the number of facts in
the data mart is relative to the number of events stored in the data
warehouse, it is logical to size the data mart by the number of workflow
steps.
The Process Analyzer
data mart is named VMAEDm.
|
OLAP cubes |
The space used by the OLAP cubes is very small in comparison to
the data warehouse and the data mart, so you will not be concerned
with its storage capacity. The final formula for calculating storage
space requirement should be roughly sufficient for storing the data
warehouse, data mart, and the OLAP cubes as well.
The Process Analyzer
data warehouse is named VMAE[_instance_name],
where instance_name is the name of a named instance of the Process Analyzer
database. (The name of a default instance of the database is simply
VMAE.)
|
Formula for Calculating Storage Space Requirements
You can use the following formula to calculate the storage space requirements
for the Process Analyzer
Engine
database based on the expected number of processed steps:
Storage space = (total number of expected workflow steps
processed over time) * 1.5kb
If user-defined fields are included then the storage space requirement
is:
Storage space = (total number of
expected workflow steps processed over time) * 1.5 kb + [(total number
of user defined fields * total number of expected workflow steps processed
over time) * 0.6kb]
An example calculation
For an installation that processes 250,000 steps a day continuously for
365 days a year, the total number of processed steps is 91,250,000. The
total storage space needed is 137 gigabytes.
If ten user-defined fields are added, the storage space requirement is
684 gigabytes.
Pruning the data warehouse and its effect on storage space requirements
Pruning the data warehouse and the data mart has significant affect on
storage space demand. See Pruning the
Process Analyzer
database.
|