IM Relationship Resolution Information Center, Version 4.2

Evaluating new data against existing data

If you have large data sets that are regularly updated, you can use the net change utility to identify records that were added changed or deleted. After you have identified these records you can save processing time by only sending the altered records to the pipeline.

Before you begin:
Make sure all source files are fixed width files.
Procedure:
  1. Sort the incoming data file according to the key that uniquely identifies each record. (This is critical to success.)
  2. Run Net change utility.
    java -server -jar jnce.jar --cfg-file=filename.ini
     --base-file=filename.base --new-file=inputfile
     --out-root=filename
    Note: This command line is wrapped.
  3. If an error occurs after this step:
    1. Correct the cause of the error.
    2. Delete the .diff and .merge files.
    3. Run the Net change utility again.
  4. Check to make sure the results are reasonable:
    1. Visually check the data.
    2. Check the number of records. For example, if you were running 1 million records and expected 100,000 records after running the Net change utility, make sure that the actual number of records matches the expected number.
  5. Archive the old base file and rename the new .merge file for use as the new base file.
What to do next:
Now, you have eliminated duplicate and unchanged records and can send only records that need to be processed by the pipeline to a UMF generating utility or directly to the pipeline, if those records meet UMF requirements.


Feedback

Last updated: 2009