PDF indexing is supported through a dynamically loadable PDF filter (
flt_pdf.so
or flt_xml.sl
on UNIX, flt_pdf.dll
on Windows). By default, the Verity engine invokes the universal filter with the PDF filter as a helper filter./usr/verity
, the PDF collection can be created as follows:
REPORT.PDF
into the collection, use the following command:
style.lex
file. There is no way to specify alternative lexing rules.
flt_pdf
can be invoked together with the universal filter or as a single filter. By default, the PDF filter is invoked with the universal filter. The PDF filter can be invoked in two ways. To invoke the PDF filter with the universal filter, it must be specified in the
style.uni
file with the type
keyword and the /format-filter
modifier, as shown in the sample style.uni
syntax below:
- type: "application/pdf"
- /format-filter = "flt_pdf"
- /charset =
1252 #1252 is the default
style.dft
file using the field
keyword and the /filter
modifier, as shown in the sample style.dft
file syntax below:
- field: DOC
- /filter = "flt_pdf -charmapto
850
"- /charmap = 850
style.uni
file can include a field override option, -fieldoverride
, that specifies that the field values generated by the PDF filter override those generated by a Verity gateway.To use the
-fieldoverride
option, include it as part of the /format-filter
specification as follows:
- type: "application/pdf"
- /format-filter = "flt_pdf -fieldoverride"
- /charset =
1252 #1252 is the default
style.dft
can include a character mapping option, -charmapto
, to control the character set output by the filter. This option is specified in the style.dft
file and is used only when the PDF filter is invoked as a single filter. Valid values for the -charmapto
option are:
-charmapto Value
|
Description
|
---|---|
1252
|
For code page 1252
|
850
|
For IBM code page 850
|
8859
|
For ISO-8859
|
mac1
|
For Macintosh systems
|
-charmapto
option is not specified, the PDF filter uses the platform's default character encoding. On Unix and Windows systems, the default character encoding is 8859; on Macintosh systems it is mac1.
style.sfl
file, they are populated in the collection's internal documents table. PDF fields can be populated by the PDF filter if they exist in the information dictionary for the PDF document.
style.sfl
file. These fields are populated unless changes are made to the style.sfl
file. For the predefined fields, the Adobe PDF field names are mapped to Verity collection names as described below.
style.sfl
file, but are commented out and therefore are not populated by the PDF filter. For information on defining these fields, see "Defining Optional PDF Fields" following this table.
-fieldoverride
option to the PDF filter specification in the style.uni
file
style.sfl
file
style.uni
file and add the -fieldoverride
option to the PDF filter specification as follows:
type: "application/pdf"
-fieldoverride
/format-filter = "flt_pdf"
style.sfl
file and do the following:
- #My new field to define FTS_CreationDate
- #My new field to define FTS_CreationDate
- fixwidth: PdfCreatedDate 4 date
- #My new field to define FTS_CreationDate
- fixwidth: PdfCreatedDate 4 date
- /alias = FTS_CreationDate
- #My new field to define FTS_CreationDate
- fixwidth: PdfCreatedDate 4 date
- /alias = FTS_CreationDate
- /alias = Created