Stop word list

When the system populates indexes with content, it can skip words that are in a stop word list for the local language. Words added to this list are considered "noise" words—words users do not typically consider as key words in a CBR query. For example, users would not use articles (such as "a" or "the") as key words in a CBR query. Adding words such as "the" to the stop word list prevents CBR searches from finding all documents that contain "the." Adding noise words to the stop word list also minimizes the number of words indexed and enables queries to complete faster.

The style.stp file contains a stop word list that controls what is indexed. There is one stop word list for each supported language. When you create a set of style files, you can include the appropriate stop word list for the language used in the style files, which excludes the words in the stop word list from the indexes created with those style files. Using a stop word list improves performance by 30%; however, queries on the stop words will not return results, since the words are not full text indexed.

To create a style.stp file, find the stop word list in the release Verity files. This file is typically named vdk30.stp, and resides in the “Locales” package for foreign languages, or for just English stop word lists, in the content search engine main install package.

You can add or remove noise words from the style.stp file. The style.stp file is a flat ASCII file that you can edit with a text editor. There must be only one word per line and all words must be left-justified. Words can appear in any order in the file. If your indexes are case-sensitive, you must add all case variations to the noise word list. For example, to filter out the word "the," you must include entries for both "the" and "The."

NOTE  

Also see the Verity Collection Documentation for more details on stop word lists.