Languages and Character Sets


The following table lists the supported languages and relevant character sets. Languages are specified by -locale and character sets are specified by -charmap.

Language (locale)
Character sets
(default is bold)

Java character set name
Common character set name
All languages
utf8
UTF8
UTF-8
bokmalx, danishx, dutchx, englishx, finnishx, frenchx, germanx, italianx, nynorskx, portugx, spanishx, swedishx
437
850
1252
8859

Cp437
Cp850
Cp1252
ISO8859_1

MSDOS 437
IBM 850
Windows 1252
ISO-8859-1

simpcb
gb
GBK
gbk
tradcb
big5
Big5
Big5
japanb
sjis
eucjp
iso2022_jp

SJIS
EUC_JP
ISO2022JP

Shift-JIS
EUC-JP
ISO-2022-JP

koreab
ksc
EUC_KR
KSC 5601
russian
1251
8859-5
koi8-r

Cp1251
ISO8859_5
KOI8_R

Windows 1251
ISO-8859-5
KOI8-R

russian2
1251
8859-5
koi8-r

Cp1251
ISO8859_5
KOI8_R

Windows 1251
ISO-8859-5
KOI8-R

greek
1253
8859-7

Cp1253
ISO8859_7

Windows 1253
ISO-8859-7

turkish
1254
8859-3
8859-9

Cp1254
ISO8859_3
ISO8859_9

Windows 1254
ISO-8859-3
ISO-8859-9

arabic
1256
8859-6

Cp1256
ISO8859_6

Windows 1256
ISO-8859-6

hebrew
1255
8859-8

Cp1255
ISO8859_8

Windows 1255
ISO-8859-8
UTF-8

polish, hungarian, czech
1250
8859-2

Cp1250
ISO8859_2

Windows 1250
ISO-8859-2

thai
874
Cp874
Windows 874
bulgaria
1251
Cp1251
Windows 1251

Specifying Languages and Character Sets

To specify a language, use one of the terms in the Locale/Language column with the relevant locale or language option for the command-line tool. This is usually -locale. If you do not want the default character set, specified in bold and automatically recognized by the command-line tool, also specify a character set with the -charmap option.

NOTE: The common name for the character set, where applicable, is provided in parenthesis next to the character set. You do not specify the common name. For example, you would specify -charmap gb. The common name for the character set used by the Verity command-line tools is gbk, but you only need to specify gb.

K2 Spider

The following example with k2spider_srv specifies a locale of frenchx without specifying a character set, thus accepting the default of 1252.

c:\>k2spider_srv -inndexer -locale frenchx options

Verity Spider

The following example with vspider specifies a locale of hebrew with the character set 8859-8.

c:\>vspider -locale hebrew -charmap 8859-8 options

mkvdk

The following example with mkvdk specifies a locale of spanishx without specifying a character set, thus accepting the default of 1252.

c:\>vspider -locale hebrew options





Copyright © 2002, Verity, Inc. All rights reserved.