The runtime architecture for BIA reporting defines how the data flows from the transactional data source to the Reporting data sources and on to populate the reports. The data is moved from the Source database to the Staging database. From here it is moved to the Central Data Warehouse (CDW), and finally is pushed out to the Data Marts. Once the Data Marts are populated the reports are run.
Data Flow
The data is moved through the Reporting solution in a number of distinct steps.
- Step 1. Extract to Staging Database. The first step is the extract of the data of interest from the source database. The data is filtered on the LASTWRITTEN column in the source tables. Once the data of interest, e.g. all new entries in a particular table, is identified it is moved to the Staging area. The Staging area is a data storage area containing data from various data sources, it is essentially a copy of a subset of the source tables. This data movement is the first run of the ETL, and the data can be cleansed at this point to ensure there is no 'dirty' data that could lead to inaccurate reports.
- Step 2. Staging to CDW. After the required data is in the Staging area it is ready to be moved to the CDW. The CDW is the 'core' of the Reporting solution: it contains all the archived data stored in a normalized data structure. The physical location of this database is usually on the same database as the Staging database. The CDW is optimized for the efficient storage of large amounts of data. It does not prejudge how the data stored will be queried, it just stores the data that is required for analysis. It achieves this by serving all the current reporting needs and also attempting to capture the underlying business processes. Staging area data is not in the form that is required for reporting and has some gaps in it. Therefore when the data is moved from Staging to CDW any 'business logic' required is run on the data to fill in those gaps. This ensures it arrives in the CDW in a state that is useful for analysis.
- Step 3. Load Data Marts. Finally the data is moved from the CDW to the Data Marts, this is done by running another series of ETL processes. Data marts are de-normalized dimensional structures (Star schema). They are logical groupings of data that service a particular group of reports. ETL programs extract a subset of historical data from the CDW. This subset of data, a Data Mart, contains data that is tailored to and optimized for a specific reporting or analysis task.
An overview of the data flow between the main data sources is as follows.
- Source Databases to Staging Area
- Staging Area to Central Data Warehouse
- Business Logic applied from external business logic
- Central Data Warehouse to Datamarts
- Transformed to Dimensional
As is evident, there can be multiple data sources, only one Staging area, one CDW, and multiple Data Marts.