The pipeline configuration file (usually pipeline.ini) contains
the initial values for variables and configuration information that pipelines
need to process incoming data. Settings in the pipeline configuration file
override the global system settings (such as in system parameters) for all
the pipelines that use the same pipeline configuration file.
You can manually add or change parameters or values in the pipeline
configuration file. However, by doing so, it is possible to corrupt this file,
create an invalid configuration, or cause pipeline processing disruption.
Before you make any changes to the pipeline configuration file, make a copy
of the file for safekeeping.
[pipeline] section parameters
Contains configuration
data for the pipelines. Do not rename this section header; it must remain
[pipeline] or the pipelines using this configuration file will shut down with
an error.
- CMEAdminTransport
- Specifies the HTTP Uniform Resource Identifier (URI) of the transport
to the Configuration Console, which is the component that contains the application
monitor. You can use the application monitor to monitor the pipeline status
and statistics and to route results from pipelines.
- The default value is empty. This setting is usually commented out.
- InputTransport
- Specifies the URI of the transport where the pipeline receives incoming
data. If the specified transport does not exist, the system either does not
start the pipeline or shuts down the pipeline.
- You can specify multiple transports by using a blank space between URIs.
- The default value is empty. This setting is usually commented out, because
input transports are usually specified on the command line when starting pipelines.
- Concurrency
- Specifies the number of concurrent parallel pipeline processing threads
spawned when a single pipeline process is started. This parameter can be any
positive integer greater than or equal to 0. The higher the number, the more
pipeline processing threads start with each pipeline start command, and the
more incoming data records can be processed at the same time. (One record
is processed by each parallel processing thread.)
- This setting takes precedence over the DEFAULT_CONCURRENCY system parameter
setting specified in the System Parameters tab in the
Configuration Console.
- If the dynamic scoring engine system feature is enabled in the [DSE] section,
this maximum concurrency number should be set to 1. The dynamic scoring engine
feature cannot use the parallel pipeline processing feature.
- The default value is 1, which means that only 1 pipeline processing thread
is spawned with each pipeline start command. However, you can override this
default value by specifying the concurrency number in the transport parameter
of the pipeline start command.
- ErrorLimit
- Specifies the number of errors that the system can encounter during processing
in a 24 hour period before it shuts down the pipeline that is encountering
the errors. This error limit includes database errors, pipeline error, queue
errors, or UMF parsing errors. If this setting value is 0, the pipelines never
shut down, regardless of the number of errors encountered.
- UMF exception errors are not included in this error limit setting. UMF
exception error limits are set using the LogOnAllUMFExceptions and StopOnAllUMFExceptions
parameters
- This setting interacts with the ErrorRestInterval parameter.
- The default value is set to 10.
- ErrorResetInterval
- Specifies the number of minutes the system counts errors that apply to
the ErrorLimit parameter before resetting the error count for the pipeline.
- If the specified number of minutes elapses without the system exceeding
the ErrorLimit count, the system resets the number of errors counted for that
pipeline.
- If the system exceeds the number of errors specified in the Error Limit
parameter before the number of minutes specified in the ErrorResetInteral
parameter, the system shuts down the affected pipeline.
- The default value is set to 1440.
- LogOnAllUMFExceptions
- Indicates whether UMF exceptions are logged to the pipeline log file, *.msg
where * is the name of the pipeline where the exception occurred. Valid
values are Y or N:
- If this parameter is set to Y, all incoming data that generates UMF exceptions
are placed in a *.msg log file, and the exception is logged in the UMF_EXCEPT
table.
- If this parameter is set to N, the system does not log UMF exceptions
to a *.msg log file. However, the exceptions are still logged to the UMF_EXCEPT
table.
- This setting interacts with the StopOnAllUMFExceptions setting.
- The default value is set to Y.
- StopOnAllUMFExceptions
- Indicates whether or not the system stops processing incoming data and
shuts down the pipeline when it encounters a UMF exception. Valid values are
Y or N:
- If this parameter is set to Y, the system automatically stops processing
all incoming data when the first UMF error is encountered and shuts down the
pipeline. This setting is typically only used during initial implementations
to gather additional information about incoming UMF exceptions.
- If this parameter is set to N, but the LogOnAllUMFExceptions parameter
is set to Y, the system logs the UMF exception to the UMF log file and continues
processing the incoming data. The data involved in the UMF exception is not
processed, which means that you must review the UMF exception log to find
the problem with the UMF, correct the data, and then reload the entire UMF
record into a pipeline for processing.
- If this parameter is set to N, and the LogOnAllUMFExceptions parameter
is set to N, the pipeline completes a partial data load, loading only the
data that is not included in the UMF exception. The data included in the UMF
exception is not processed, which means that you must review the UMF exception
log to find the problem with the UMF, correct the data, and then reload the
corrected data into a pipeline for processing.
- The default value is set to N, and the default value for the LogOnAllUMFExceptions
parameter is set to Y. This means that by default, the system logs the UMF
exception to the UMF log file and does not process the incoming data record
with the UMF error.
[SQL] section parameters
Defines the configuration
for the database connection between the pipelines and the entity database.
- Connection
- Specifies the URI (universal resource indicator) for pipelines to connect
to the entity database. Each database type uses a specific different syntax,
but the base syntax is indicated as follows:
- type://user:password@database/?timeout=N
- To specify a DB2® database connection
- db2://user:password@database/?timeout=N/?schema=schemaname
- where
- db2:// indicates the database type
- user:password@database specifies
the login (user name and password) to access the specified database
- /?timeout=N is the number of
time (in seconds) that the pipeline waits for a response from the database
before timing out
- and /?schema=schemaname is the
name of a custom DB2 schema. (This setting is optional and typically used
only when you want to specify a non-standard or custom DB2 database schema.)
Note: The DB2 custom schema feature
is not compatible with the reports generator in the Visualizer and the Configuration
Console. If you specify a custom DB2 schema, the Visualizer and Configuration
Console reports will not work.
- To specify an Oracle database connection
- oci://user:password@SID
- where user password specifies the
login (user name and password) to access the database
- SID matches the SID parameter set for this Oracle database.
- and /?timeout=N is the number
of time (in seconds) that the pipeline waits for a response from the database
before timing out
- To specify a Microsoft® SQL server database connection
- mssql://user.password@DSN
- where user passwordspecifies the
login (user name and password) to access the database
- and DSN matches the DSN parameter set for this Microsoft
SQL server database.
- and /?timeout=N is the number
of time (in seconds) that the pipeline waits for a response from the database
before timing out
- The default value is empty.
- LogTable
- Specifies the table to use when the system logs UMF messages. Use this
parameter if you have multiple pipelines sending data to the same entity database;
each pipeline needs to write log information to separate tables.
- If you specify a table other than the default value of UMF_LOG, you must
create the new table in the database, and that table must contain the same
fields as the UMF_LOG table.
- DeadLockRetries
- Specifies the number of retries the system attempts during the processing
of an incoming UMF message, after the pipeline times out or exceeds the deadlock
conditions. If this number is exceeded, the pipeline shuts down.
- The default value is set to 3, but it is usually commented out using the
number sign.
- DebugLevel
- Controls the level of detail of the messages sent to the SQL debug log, *.SqlDebug.log,
where * is the name of the pipeline node set to debug mode. Valid values
include:
- If this parameter is set to 0, no log is created. Use only for debugging.
- If this parameter is set to 1, the system logs performance statistics.
- If this parameter is set to 2, the system logs all SQL messages.
- If this parameter is set to 3, the system logs all performance statistics
and logs all SQL messages.
- The default value is set to 0, which means that by default, no messages
are sent to the SQL debug log.
[OAC] section parameters
Defines the configuration
parameters for address correction that is integrated into the pipeline processing.
Note: Each
feature of the software is provided by independent software vendors and is
licensed separately by these vendors.
- AddrConnection
- Specifies the URI for the address correction software. The value must
use a specific syntax:
- prodtype://host:portnumbers
- prodtype
- If you use IBM® WebSphere® QualityStage,
use waves .
- If you use Group 1 Software Universal Coder, set this to g1unc.
- If you use Group 1 Software CODE-1 Plus, set this to g1cs.
Note: If
you choose to switch from a Group 1 Software product to the IBM WebSphere
QualityStage, contact your Professional Services contact for assistance in
making the transition, because there are differences in the product that produce
different address hashes.
- host
- Specifies the name of the host machine that runs the address correction
software, or the IP address of the host server for the address correction
software.
- portnumbers
- Specifies the port numbers to use for the address correction software.
You can use the default port number(s) for the address correction software
your system uses, or if your system is configured to use other port numbers,
you specify those port numbers here.
- This list contains the default port numbers by address correction software:
- For IBM WebSphere QualityStage, the default port number is 6010.
- For Group1 Software Universal Coder, the default port number is 8080.
- For Group1 Software CODE-1 Plus, the default port numbers are:
- For United States CODE-1 Plus: use us_port=3008.
- For Canadian CODE-1 Plus: use can_port=3014.
- For International CODE-1 Plus: use int_port=3006.
- The default value for these parameters is empty.
- GeoConnection
- Use this parameter only if you are using Group 1 Software Geographic CODE-1
Plus. The parameter specifies the host name (or IP address) and port number
for the Group1 Software Geographic Coding Plus product. This setting uses
a specific syntax:
- prodtype://host:portnumber
- prodtype
-
- host
- The name of the host machine that runs the address correction software,
or the IP address of the host server for the address correction software.
- portnumber
- Specify the port number for this connection or use the default value
of 3010.
- OverrideState
- Use this parameter only if you using Group1 Software, and you want the
system to replace the incoming United States state value with the associated
two-digit state code. Valid values are Y or N:
- If this parameter is set to Y, the incoming United States state value
is replaced with its associated two-digit state code.
- If this parameter is set to N, the incoming United States state value
is not replaced and left as is.
- The default value is set to Y.
[MM] section parameters
Defines configuration parameters
for entity resolution.
- DOBConfThreshold
- Specifies the threshold for the date of birth (DOB) confirmation or denial.
The DOB scoring is a point scale from zero to 100, based on the date of birth
resolution algorithms. This parameter sets the point level where differences
in dates of birth become denials.
- The higher the threshold number, the less the difference can be between
the dates of birth to score high during the confirmation and denial stage
of entity resolution.
- The default value is set to 90.
- CircaDOBAttribute
- Specifies the ATTR_TYPE_ID value in the ATTR_TYPE table that indicates
a circa date of birth.
- The default value is set to 4. However, because circa dates of birth are
used infrequently, this default setting is usually commented out using the
number sign.
- CircaRangeThreshold
- Specifies the number of units that a date of birth (DOB) can differ from
a circa DOB and still be considered by the system as matching values. This
threshold is used with the CircaRangeType parameter.
- The default value is set to 1. The default CircaRangeType parameter is
set to Y. Together, these parameters indicate that the default number of units
that a date of birth can differ from a circa date of birth is 1 year.
- However, because circa dates of birth are used infrequently, this default
setting is usually commented out using the number sign.
- CircaRangeType
- Indicates the type of threshold unit for the circa date of birth (DOB).
This parameter is used with the CircaRangeThreshold threshold.
- Valid values are M or Y:
- If this parameter is set to M, the circa DOB threshold is in months.
- If this parameter is set to Y, the circa DOB threshold is in years.
- The default value is set to Y, which means that the system uses years
as the circa date of birth threshold. The default CircaRangeThreshold value
is set to 1. Together, these parameters indicate that the default number of
units that a date of birth can differ from a circa date of birth is 1 year.
- However, because circa dates of birth are used infrequently, this default
setting is usually commented out using the number sign.
- DateRangeThreshold
- Specifies the number of units for the From and Through date thresholds.
This parameter is used with the DateRangeType parameter.
- If this parameter is set to -1, disregard all From and Through date processing.
- If this parameter is set to 0, use From and Through dates as given.
- If this parameter is set to a number between 1 and x, the number represents
the maximum gap size for non-overlapping date ranges.
- The default value is set to 0. So by default, the system processes From
and Through dates as specified in the incoming UMF message. This setting is
usually commented out using the number sign.
- DateRangeType
- Specifies the unit for the date range threshold. This parameter is used
with DateRangeThreshold.
- Valid values are D, M, or Y:
- If this parameter is set to D, the date range threshold is in days.
- If this parameter is set to M, the date range threshold is in months.
- If this parameter is set to Y, the date range threshold is in years.
- The default entry is set to M. So by default, the system processes incoming
From and Through date ranges in months. This setting is usually commented
out using the number sign.
- LogDenials
- Specifies whether to log denial information from entity resolution. This
setting is either commented out or must be manually entered.
- Valid values are Y or N:
- If this parameter is set to Y, the system logs denials.
- If this parameter is set to N, the system does not log denials. If this
setting is present in the configuration file, the default value is N.
[DSE] section parameters
Defines configuration parameters
for the dynamic scoring engine functionality. This functionality may not be
used by all pipelines.
- Enabled
- Indicates whether or not the dynamic scoring engine functionality is enabled
or not. Valid values are Y or N:
- If this parameter is set to Y, the dynamic scoring engine functionality
is enabled in the pipeline.
- If this parameter is set to N, the dynamic scoring engine functionality
is disabled in the pipeline.
- The default value is set to N.
IBM Degrees of Separation configuration parameters
When
you install IBM Degrees of Separation for Relationship Resolution, you can
configure additional parameters for that component, after you install the
IBM Degrees of Separation component and its associated information. If you
have already installed the component and its information, use the link in
the Information Center navigation tree to review its configuration parameters.
IBM Entity Analytic Solutions Name Manager configuration parameters
When
you install IBM Entity Analytic Solutions Name Manager, you can configure
additional parameters for that component, after you install the IBM Entity
Analytic Solutions Name Manager component and its associated information.
If you have already installed the component and its information, use the link
in the Information Center navigation tree to review its configuration parameters.