IM Relationship Resolution Information Center, Version 4.2

Name hygiene and standardization

During pipeline processing, names are cleansed and standardized to prepare the identity record for optimal entity resolution processing.

Pipeline processes provides the most accurate name information about entities for current, future, and historical use. As new or changed identity name data enters the system, it is compared against the product name standardization dictionary, which contains a list of root names and their known derivatives, to identify the root name. When the root name is identified, the system keeps both the root name and the original name for the incoming identity record.

For example, the following table shows two examples of possible derivatives of the same root name, including the various ways to spell the name. The names on the left are all derivatives of the root name on the right.

Table 1. Examples of some possible derivatives for the root names of Richard and Mohammad
Derivatives Root

Dick, Dickie, Ricardo

Rich, Richie, Rick

Rickey, Ricki, Rickie

Ricky, Rikki, Ritchie

Richard

Mohamad, Mohammad

Mohamed, Mohammed

Mohammad

The name hygiene and standardization process also corrects any misspellings, if necessary, but again, the system keeps both the original spelling and any corrections as part of the record. Most other systems (including ETL and database marketing tools) do not.

Name hygiene and standardization are an important step to increase the confidence levels of entity resolution. This process is especially important because the average person uses as many as five different versions of his or her name for official and consumer purposes.



Feedback

Last updated: 2009