Characters

The most basic and indivisible unit of the COBOL language is the character. The basic character set includes the letters of the Latin alphabet, digits, and special characters.

In the COBOL language, individual characters are joined to form character-strings and separators. Character-strings and separators, then, are used to form the words, literals, phrases, clauses, statements, and sentences that form the language.

The basic characters used in forming character-strings and separators in source code are shown in Table 1.

For certain language elements, the basic character set is extended with the EBCDIC Double-Byte Character Set (DBCS).

DBCS characters can be used in forming user-defined words.

The content of alphanumeric literals, comment lines, and comment entries can include any of the characters in the computer's compile-time character set, and can include both single-byte and DBCS characters.

Runtime data can include any characters from the runtime character set of the computer. The runtime character set of the computer can include alphanumeric characters, DBCS characters, and national characters. National characters are represented in UTF-16, a 16-bit encoding form of Unicode.

When the NSYMBOL (NATIONAL) compiler option is in effect, literals identified by the opening delimiter N" or N' are national literals and can contain any single-byte or double-byte characters, or both, that are valid for the compile-time code page in effect (either the default code page or the code page specified for the CODEPAGE compiler option). Characters contained in national literals are represented as national characters at run time.

For details, see User-defined words with DBCS characters, DBCS literals, and National literals.

Table 1. Basic COBOL character set. This table lists basic COBOL character set.
Character Meaning
  Space
+ Plus sign
- Minus sign or hyphen
* Asterisk
/ Forward slash or solidus
= Equal sign
$ Currency sign1
, Comma
; Semicolon
. Decimal point or period
" Quotation mark2
' Apostrophe
( Left parenthesis
) Right parenthesis
> Greater than
< Less than
: Colon
_ Underscore
A - Z Alphabet (uppercase)
a - z Alphabet (lowercase)
0 - 9 Numeric characters
  1. The currency sign is the character with the value X'5B', regardless of the code page in effect. The assigned graphic character can be the dollar sign or a local currency sign.
  2. The quotation mark is the character with the value X'7F'.