Understanding automatic document classification

FileNet P8 Platform's object model describes how objects receive their properties and security from the class on which they are based. The automatic classification framework is a means whereby the values assigned to some of those properties are derived from the content of the document itself.

Document Classification Manager

The Document Classification Manager determines when a document requires auto-classification based on a flag on the CheckIn method. The Enterprise Manager's Create New Document Wizard includes a check box for setting this flag. It then determines which Classifier to invoke to carry out the operation. Classifiers are selected on the basis of the Content Type (MIME type) of the source document. Once the appropriate Classifier is located and loaded into memory, the Classification Manager invokes the Classify method to perform the actual classification. When control returns from the Classifier, the Classification Manager sets the ClassificationStatus on the target document to DocClassificationComplete or DocClassificationFailed depending on the error status return by the Classifier.

Content Engine (CE) supports the capability to determine whether or not a particular version of a document is automatically classified and, if so, whether or not the operation is successful, fails, or in the case of an asynchronous request, is still pending. Error and warning messages occurring during asynchronous classification are logged in the Application Event log. Custom classifiers return error information that is logged by the Classification Manager. Errors encountered during the process of classifying a document cause the Classification Manager to set the ClassificationStatus property on the target document to a value of 'DocClassificationFailed'.

Classifiers return error information to the Document Classification Manager that are written to the error log when execution is occurring asynchronously. The Classification flowchart has more information about the Document Classification Manager.

Asynchronous classification

CE supports asynchronous processing for auto-classification. Asynchronous processing means that classification occurs within a separate transaction and that the server does not wait until the classification operation has completed before returning control to the client. This means that when you request that a document be automatically classified during checkin, the checkin actions is usually complete before the automatic classification transaction is completed.

CE guarantees execution of asynchronous auto-classification requests. This guarantee means that asynchronous auto-classification requests cannot get "lost"; that is, they result in either the successful classification of the target document or an appropriate error is recorded.

Security

A document's security and system generated properties (date and time, initial storage location, etc) is determined by the document's class, or one of the subclasses that a document is initially assigned to before being automatically classified. These settings are not changed when you assign the document to the target document class by automatic classification.

Asynchronous automatic classification executes with the access permissions of the user that initiates the classification request. For example, if you requests a document to be asynchronously auto-classified, the Classifier that carries out the operation executes as if it was invoked by you directly. It is capable of gaining access to any object that you have access rights to and prevents access to any object for which she has insufficient access rights.

Classification status

The document object includes a read-only property called ClassificationStatus whose value is automatically assigned by the system. This property indicates the auto-classification status of a document, thereby providing client applications a means of discovering whether or not a particular document was subject to auto-classification and whether or not the operation was successful.

The ClassificationStatus property has the following possible values:

Document Classification Action Class

The Document Classification Action class enables you to create a persistent mapping between a specific MIME Type and the object designed to classify documents of that type. Implementation objects are represented as Java class names stored in the ProgID property. You can map a MIME Type to no more than one implementation object; however a single implementation object may support multiple MIME types.

Instances of the Document Classification Action class are associated with Object Store objects. The cardinality is zero or more Document Classification Actions for each object store. The important classification-related properties of this class are the following: