Documents checked into the Content Engine require a class. A document can be classified manually, with a user selecting the document's class, or it can be classified automatically when the document is checked in. The Content Engine provides an extensible framework that enables an incoming document of a specified MIME type to be automatically assigned to a target document class, as well as setting selected properties of that target class based on values found in the incoming document. A classification component, or classifier, does the work of assigning a document class. One such classifier packaged with the Content Engine is XML Classifier. For details on the XML Classifier, see Classification Flowchart and Understanding the XML Classifier.
You can also plug custom Java™-implemented classifiers into the document classification framework, which is the focus of this topic. It outlines custom classifier requirements and describes the document checkin process of which classifier execution is a part. Before developing a custom classifier, read Understanding Automatic Document Classification. It discusses key classification framework concepts, including the asynchronous execution of a classifier and the security implications of automatically classifying a document.
For document classification code examples, see Working with Document Classification-related Objects.
To plug a custom classifier into the document classification framework, you must do the following:
Document
object that is passed to it. Typically, this involves extracting metadata from the content of the Document
object and mapping it to the class properties inherited by the object. You can implement your classifier to support one or multiple MIME types.
You can package the class in a JAR file, and you can check in your class or JAR file as a CodeModule object in a Content Engine object store. Alternatively, you can specify the document classifier in the classpath of the application server where the Content Engine is running. A document classifier runs asynchronously on the Content Engine.
createInstance
method, then set the following required properties:
DocumentClassificationAction
object must be different from the MimeType property of other DocumentClassificationAction
objects.DocumentClassificationAction
's CodeModule property.For best practices for implementing a DocumentClassifier
interface, see Implementation Concepts. For code examples on implementing a document classifier and on creating a DocumentClassificationAction
object, see Working with Document Classification-related Objects. For deployment options, see Deploying Action Handlers.
You can automatically classify documents with a content type that matches the MIME type property of an existing DocumentClassificationAction
object. To automatically classify a new document, you create a Document object and do the following:
Document
's MimeType property to match the value of the DocumentClassificationAction
's MimeType property. Note that although the Document
's MimeType property is not required, we recommend that you set it to guarantee that the intended document classifier is invoked. If you don't set the MimeType property, the Content Engine will map the content element's file extension to the MIME type, and the result may be different from what you anticipate. See About MIME Types.checkin
method of the Document
object, specifying the AUTO_CLASSIFY constant.The Document
object is checked into the object store with an initial class, and the object's ClassificationStatus property is set to CLASSIFICATION_PENDING. Document classification is an asynchronous action; therefore, the auto-classification request is queued, represented by a DocumentClassificationQueueItem object.
The Classification Manager is responsible for dequeuing a classification request and processing it. The Classification Manager obtains the MIME type from the target document and searches for the DocumentClassificationAction
object registered for that MIME type. Getting the class name of the classifier from the DocumentClassificationAction
's ProgId property, the Classification Manager invokes the classifier. A classifier operates with the same access permissions of the user who initiated the document checkin.
When document classification is complete, the Document
's ClassificationStatus property is updated to indicate success or failure.
If classification fails, the initial class assigned to the document remains. If classification succeeds, a new class is assigned to the document and a
ClassifyCompleteEvent object is triggered.