IBM FileNet P8, Version 5.2.1            

Stop word input file: Format reference

The command-line stop word tool reads your specified stop word input file to set the stop words for an IBM® Content Search Services server. The input file must be named in the following format: code-Stw.xml, where code is the appropriate language code. For example, the French stop word file is called fr-Stw.xml.

You can find input file templates for each supported language in the server-home\config\dictionaries directory. The template file for the following languages includes terms that occur frequently in the language:

For example, the English template file includes words such as a and the.

A stop word file has the following XML format:

	<?xml  version="1.0" encoding="UTF-8"?>
	<stopWords>
		<stopWord>word1</stopWord>
		<stopWord>word2</stopWord>
		<stopWord>wordn</stopWord>
	</stopWords>

Example:

	<?xml  version="1.0" encoding="UTF-8"?>
	<stopWords>
		<stopWord>the</stopWord>
		<stopWord>WebSphere</stopWord>
		<stopWord>OmniFind</stopWord>
	</stopWords>
Stop words have the following restrictions or features:
White space and punctuation characters prohibited A stop word cannot include white space or punctuation characters, such as a comma (,) or vertical bar (|), because these characters might interfere with the query syntax.
Automatic normalization The removal of accents or umlauts and other normalization is handled automatically. For example, if you want to include the term météo as a stop word, you do not need to include the term meteo too.
Compound terms identified (Germanic languages) Compound terms in Germanic languages are correctly identified in queries. A compound term is the combination of two or more words that is used as a single word. Lexicalized compounds such as Reisebüro (travel agency) are not considered to be compounds. A compound term in a query is broken up into the individual terms that make up the compound. If any of the individual terms are stop words, the compound term is not removed from the query. For example, the query term Versicherungspolice (insurance policy) returns documents that contain the compound terms Lebensversicherungspolice (life insurance policy) and Haftpflichtversicherungspolice (third-party insurance policy). Even if the word Police is a stop word, the compound query term Versicherungspolice is not removed from the query.


Last updated: March 2016
csscbr_stopword_inputfile.htm

© Copyright IBM Corporation 2016.