In order for the system to store and process large attribute data for scoring, metadata must be converted to Universal Message Format (UMF) and stored in the appropriate columns.
Use the ATTR_VALUE and ATTR_LARGE_DATA columns to store large or unstructured attribute data for custom attribute and scoring applications.
Column and UMF tag name | Data type and size | Required | Explanation |
ATTR_VALUE | varchar(255) (default) resizable up to 8k | Yes | Data used as one of the attributes in an ETL process
with the base scoring plugins. In cases where the data is larger than 8k and in binary format, store the data in the ATTR_LARGE_DATA column and create a unique identifier for that data in the ATTR_VALUE column. That ATTR_VALUE identifier is used for comparison and scoring. For example, create an MD5 (Message-Digest algorithm 5) one-way hash that can be compared and displayed in the visualizer and reports. Max column size is database dependant. For any binary data bigger than 255/3 to be stored in ATTR_VALUE, the column must be resized. For performance reasons you should consider re-tuning the database cache because it is likely that far fewer rows will fit in the cache. |
ATTR_LARGE_DATA | Character large object (CLOB), use for data larger than 8k. | No | Store as character data. For example, use Base64 encoding
of binary data. Use this column to store attribute data that is too large for the ATTR_VALUE column. ATTR_LARGE_DATA is of type CLOB (character large object) column that can handle data of unlimited size. This data is available to entity resolution. The structure of the data must be known to the author of the customized comparison plugin. The visualizer will not display this data because the format is non-standard and will be different for various types of systems. A CLOB will not perform as well as a varchar column because a CLOB cannot be cached and requires a disk read, which is why ATTR_VALUE is preferable. If increasing the size of ATTR_VALUE will cause very little attribute data to be cached, it may be better to just use ATTR_LARGE_DATA for data smaller than 8k to ensure that other non-large attributes like gender and DOB are well cached. This is left to the architect's discretion. Consider consulting with your database administrator. |
<ATTRIBUTE><ATTR_TYPE>BIOMETRIC-1</ATTR_TYPE><ATTR_VALUE>214b21fc3e040f844a07710b1bb451a0 </ATTR_VALUE><ATTR_LARGE_DATA><![H4sICBRTqkgAA2Zvby50eHQAK0ktLuHlAgDkTqoPBgAAAA==]> </ATTR_LARGE_DATA></ATTRIBUTE>Actual ATTR_LARGE_DATA values are likely to be much larger than this example.