Stop word input file: Format reference

The command-line stop word tool reads your specified stop word input file to set the stop words for an IBM® Content Search Services server. The input file must be named in the following format: code-Stw.xml, where code is the appropriate language code. For example, the French stop word file is called fr-Stw.xml.

You can find input file templates for each supported language in the <server-home>\resource\uima directory. The template file for the following languages includes terms that occur frequently in the language:

For example, the English template file includes words such as a and the.

A stop word file has the following XML format:

<?xml  version="1.0" encoding="UTF-8"?>
<stopWords>
		<stopWord>phrase…1</stopWord>
		<stopWord>phrase…2</stopWord>
		<stopWord>phrase…n</stopWord>
</stopWords>

Example:

<?xml  version="1.0" encoding="UTF-8"?>
<stopWords>
      <stopWord>the</stopWord>
      <stopWord>WebSphere Application  Server</stopWord>
      <stopWord>OmniFind Enterprise  Edition</stopWord>
  </stopWords>

Stop words have the following restrictions or features:

Related reference
For information about the valid language codes, see the Indexing Language property in Object store properties (CBR IBM Search tab).