During pipeline processing, names are cleansed and standardized to prepare the identity record for optimal entity resolution processing.
Pipeline processes provides the most accurate name information about entities for current, future, and historical use. As new or changed identity name data enters the system, it is compared against the product name standardization dictionary, which contains a list of root names and their known derivatives, to identify the root name. When the root name is identified, the system keeps both the root name and the original name for the incoming identity record.
For example, the following table shows two examples of possible derivatives of the same root name, including the various ways to spell the name. The names on the left are all derivatives of the root name on the right.
Derivatives | Root |
---|---|
Dick, Dickie, Ricardo Rich, Richie, Rick Rickey, Ricki, Rickie Ricky, Rikki, Ritchie |
Richard |
Mohamad, Mohammad Mohamed, Mohammed |
Mohammad |
The name hygiene and standardization process also corrects any misspellings, if necessary, but again, the system keeps both the original spelling and any corrections as part of the record. Most other systems (including ETL and database marketing tools) do not.
Name hygiene and standardization are an important step to increase the confidence levels of entity resolution. This process is especially important because the average person uses as many as five different versions of his or her name for official and consumer purposes.