If you have large data sets that are regularly updated, you can
use the net change utility to identify records that were added
changed or deleted. After you have identified these records you can save processing
time by only sending the altered records to the pipeline.
Before you begin:
Make sure all source files are fixed width files.
Procedure:
- Sort the incoming data file according to the
key that uniquely identifies each record. (This is critical to success.)
- Run Net change utility.
java -server -jar jnce.jar --cfg-file=filename.ini
--base-file=filename.base --new-file=inputfile
--out-root=filename
Note: This command line is wrapped.
- If an error occurs after this step:
- Correct the cause of the error.
- Delete the .diff and .merge files.
- Run the Net change utility again.
- Check to make sure the results are reasonable:
- Visually check the data.
- Check the number of records. For example, if you were running
1 million records and expected 100,000 records after running the Net change
utility, make sure that the actual number of records matches the expected
number.
- Archive the old base file and rename the new .merge file for use
as the new base file.
What to do next:
Now, you have eliminated duplicate and unchanged records and can
send only records that need to be processed by the pipeline to a UMF generating
utility or directly to the pipeline, if those records meet UMF requirements.