Excluded word list

When the system populates indexes with content, it can skip words that are in an excluded word list for the local language. Words added to this list are considered "noise" words—words users do not typically consider as key words in a CBR query. For example, users would not use articles (such as "a" or "the") as key words in a CBR query. Adding words such as "the" to the excluded word list prevents CBR searches from finding all documents that contain "the." Adding noise words to the excluded word list also minimizes the number of words indexed and enables queries to complete faster.

The style.stp file contains an excluded word list that controls what is indexed. There is one excluded word list for each supported language. When you create a set of style files, you can include the appropriate excluded word list for the language used in the style files, which excludes the words in the excluded word list from the indexes created with those style files. Using an excluded word list improves performance by 30%; however, queries on the excluded words will not return results, since the words are not full text indexed.

To create a style.stp file, find the excluded word list in the release Verity files. This file is typically named vdk30.stp, and resides in the “Locales” package for foreign languages, or for just English excluded word lists, in the content search engine main installation package.

You can add or remove noise words from the style.stp file. The style.stp file is a flat ASCII file that you can edit with a text editor. There must be only one word per line and all words must be left-justified. Words can appear in any order in the file. If your indexes are case-sensitive, you must add all case variations to the noise word list. For example, to filter out the word "the," you must include entries for both "the" and "The."

NOTE  

Also see the Verity Collection Documentation for more details on excluded word lists.