IM Relationship Resolution Information Center, Version 4.2

Data quality check

As identity data comes into the system for processing, the pipeline checks the quality of the data to protect the integrity of the entity database. Each incoming identity record is tested for proper Universal Message Format (UMF) construction, required values, valid data types, and configured data source codes.

As the process checks the data quality, it attempts to correct the problems, if it is possible and if the system is configured to do so. When determining whether or not to correct data quality problems, the system uses the configured data quality management (DQM) rules. DQM rules define which data quality defects on incoming identity records are acceptable for the system to correct and which defects are acceptable to leave as-is but still process the records.

To view the data quality for a particular data source, you can view or print the Load Summary report. The Quality summary section can give you helpful insights into the overall data quality for that data source or for a particular set of identity records loaded from that data source. Using this information, you can adjust your ETL process, as necessary, for a particular data source.

The standard logging and error handling logs all data quality errors and corrections, as well as errors that the system could not or did not correct. Check the system logs frequently, so that you are aware of data quality errors that were not corrected by pipeline processing. In most cases, you will need to correct the data quality errors, and then reload the corrected identity records into a pipeline for entity resolution processing.

Data quality check examples

The system can automatically add codes that are not recognized as new codes, if it is configured to do so. The UMF_EXCEPT log shows the results of new codes added by the system or records rejected and not processed, because the system did not recognize a code and was not configured to add it as new.

The table below shows two examples of codes on incoming records that were not already configured in the system.
Table 1. Examples of two codes not configured in the system and the result of system processing
Code Quality check UMF_EXCEPT log
Addr_Type x New code added write to log
Num_Type xxx New code rejected write to log
  • In the first example, the system is configured to automatically add the new address type code.
  • In the second example, the system is not configured to automatically add the new code or allow the record to be processed for entity resolution.

In both cases, the system logs the action to the appropriate log file.



Feedback

Last updated: 2009