The XML filter supports indexing and viewing well-formed XML documents. Meta data extraction is also supported.
Requirements for Indexing XML Documents
To prepare for indexing XML documents:
flt_xml.dll
, flt_xml.sl
, flt_xml.so
) resides in the bin
directory for the installed platform.
style.uni
contains the directive for invoking the XML filter.
style.ufl
file.
style.xml
file.
.xml
extension if the universal filter is used; the universal filter is specified in the style.dft
file. XML documents without the .xml
extension can be indexed into a collection that contains only XML documents if the style.dft
file specifies the XML filter instead of the universal filter. For more information, see the "style.dft File" section under "Style File Configuration" below.
Implementation Summary
Verity support for XML documents is implemented by a new XML filter and controlled using a number of style files. XML Filter
The new XML filter (flt_xml.dll
, flt_xml.sl
, flt_xml.so
) resides in the bin
directory for the installed platform. Style Files
The following style files are required to enable indexing of XML files. Default style files reside in the /common/vdkstyle
directory.
style.uni
file must include the following lines:
- type: "text/xml"
- /format-filter = "flt_xml"
- /charset = guess
- /def-charset = 8859
style.uni
specify that text/xml
content be handled by flt-zone
. This specification should be replaced with the above construct.
The
style.xml
file enables administrators to change the default behavior of the indexer for XML documents. Administrators can specify field and zone indexing for regions of the document delimited by XML tags and skip regions of the document delimited by XML tags.The sample
style.xml
contains code examples that are commented out.
xmltag
but indexes the content between the start and end tags of the specified xmltag
:
xmltag
as a zone if there is also an ignore xmltag="*" command:
xmltag
. The tag, attribute, and content are not indexed:
xmltag
as a field, which is given the same name as xmltag
:
xmltag
as a field, which is given the name specified in thefieldname
attribute:
xmltag
as a field, overriding any existing value of the field:
fieldname
and index
attributes can be used in a field command.
style.xml
file, the fields must also be defined in the style.ufl
file or style.sfl
file, using standard syntax.
style.dft
file to invoke the XML filter directly. In this case, the XML documents do not need an .xml
extension. The
style.dft
must include the following lines:
- $control: 1
- dft:
- {
- field: DOC
- /filter="flt_xml"
- }
- mkvdk -create -style styledir -collection collname
- mkvdk -collection collname file1.xml file2.xml filen.xml
flist.txt
):
style.uni
and style.xml
files to enable XML document indexing support.