To enable documents of a specified MIME type to be automatically classified, you need the following:
DocumentClassificationAction
object that associates the specified MIME type with the document classifier. See Creating a DocumentClassificationAction Object.
You can also retrieve DocumentClassificationAction objects.
To submit a document for classification and to view its classification status, you must check in a document with auto-classification enabled. See Autoclassifying a Document.
For an overview of automatic document classification, see Document Classification Concepts.
To create a document classifier, you must implement the classify
method of the Java™ DocumentClassifier interface. This method is passed the new checked-in Document
object. The purpose of this method is to determine the Content Engine class to which the Document
object belongs, and then to apply the class to the object. Typically, this involves parsing the content of the Document
object and mapping metadata from the content to properties of the Content Engine class.
The following sample implementation classifies documents of mime type "text/pdf". Retrieving ContentTransfer data from the new Document
object, the classify
method uses a third-party API to parse the PDF content. It tests the subject field of the PDF content. If the subject indicates that the PDF document is a loan application, then the method uses the changeClass
method to apply the "PdfLoanApplication" class to the Document
object. Also, the method maps metadata from the PDF content to properties of the "PdfLoanApplication" class. If the PDF document is not a loan application, then the default class of the Document
object is maintained.
For best practices in implementing and packaging a DocumentClassifier
interface, see Implementation Concepts. To view a sample source code DocumentClassifier
implementation packaged with the Content Engine, go to this Content Engine directory: <drive>:/Program Files/Filenet/Content Engine/samples. The DocumentClassifier
implementation is DCAHandler.java.
Java Example
package sample.actionhandler; import com.filenet.api.core.*; import com.filenet.api.engine.DocumentClassifier; import com.filenet.api.exception.*; import java.io.*; import com.ticdoc.pdfextract.*; // 3rd-party API for parsing PDF documents public class DocClassifyHandler implements DocumentClassifier { public void classify(Document doc) { try { // Get PDF content from new Document checked into object store ContentTransfer ct = (ContentTransfer) doc.get_ContentElements().get(0); InputStream IS = ct.accessContentStream(); // Use 3rd-party API to get PDF document metadata PDFDocument pdfDoc = PDFDocument.load(IS); PDFDocumentInformation pdfProperties = pdfDoc.getDocumentInformation(); pdfDoc.close(); // Get subject of PDF document String pdfSubject = pdfProperties.getSubject(); // Classify based on PDF subject if ( pdfSubject.equalsIgnoreCase("loan application") ) { // Apply new class doc.changeClass("PdfLoanApplication"); // Get PDF properties to be mapped to Document in object store String pdfloanType = pdfProperties.getLoanType(); String pdfApplicantName = pdfProperties.getApplicant(); String pdfDateSubmitted = pdfProperties.getModificationDate().getTime().toString(); // Set properties for Document stored in object store doc.getProperties().putValue("LoanType", pdfloanType); doc.getProperties().putValue("ApplicantName", pdfApplicantName); doc.getProperties().putValue("ApplicationDate", pdfDateSubmitted); doc.getProperties().putValue("DocumentTitle", "PDF Loan Application"); // Set security owner based on loan type if ( pdfloanType.equalsIgnoreCase("home loan application") ) doc.set_Owner("GEvans"); else if (pdfloanType.equalsIgnoreCase("auto loan application") ) doc.set_Owner("EMesker"); } } catch(Exception e) { ErrorRecord er[] = {new ErrorRecord (e)}; throw new EngineRuntimeException(e, ExceptionCode.CLASSIFY_HANDLER_THREW, er); } } }
A DocumentClassificationAction object
identifies the document classifier to be launched when a document is checked in with auto-classification enabled. The following Java and C# code snippets show how to create a DocumentClassificationAction
object and set its properties. The MimeType property associates the DocumentClassificationAction
object with documents of the same MIME type. This property is set to "text/pdf". So when documents of this MIME type are checked in with auto-classification enabled, then the document classifier associated with this DocumentClassificationAction
object will be launched.
A document classifier is associated with a DocumentClassificationAction
through the ProgId and, conditionally, CodeModule properties.
You must set the ProgId property with the fully qualified name of the document classifier.
If, as shown in the examples, the document classifier is contained within a CodeModule
stored in an object store, you must
also get the CodeModule
object, then assign it to the
CodeModule property of the DocumentClassificationAction
object.
Note that you cannot set the CodeModule property to a reservation (in progress) version of CodeModule
. For more information, see Creating a CodeModule Object.
NOTE Do not set the CodeModule property if you set the application server's class path to the location of the document classifier.
When saved, a DocumentClassificationAction
object is stored in the Document Classification Actions folder of a Content Engine object store.
Java Example
... // Create document classification action DocumentClassificationAction docClassAction = Factory.DocumentClassificationAction.createInstance(os, ClassNames.DOCUMENT_CLASSIFICATION_ACTION); // Set MIME type that associates action to documents of same MIME type docClassAction.set_MimeType("text/pdf"); // Set ProgId property with fully qualified name of classifier docClassAction.set_ProgId("sample.actionhandler.DocClassifyHandler"); // Get CodeModule object CodeModule cm = Factory.CodeModule.getInstance( os, ClassNames.CODE_MODULE, new Id("{C45954D4-5DBB-460B-B890-78D6F4CFA40B}") ); // Set CodeModule property docClassAction.set_CodeModule(cm); docClassAction.set_DisplayName("DocumentClassificationAction"); docClassAction.save(RefreshMode.REFRESH); }
C# Example
... // Create document classification action IDocumentClassificationAction docClassAction = Factory.DocumentClassificationAction.CreateInstance(os, ClassNames.DOCUMENT_CLASSIFICATION_ACTION); // Set MIME type that associates action to documents of same MIME type docClassAction.MimeType = "text/pdf"; // Set ProgId property with fully qualified name of classifier docClassAction.ProgId = "sample.actionhandler.DocClassifyHandler"; // Get CodeModule object ICodeModule cm = Factory.CodeModule.GetInstance( os, ClassNames.CODE_MODULE, new Id("{C45954D4-5DBB-460B-B890-78D6F4CFA40B}")); // Set CodeModule property docClassAction.CodeModule = cm; docClassAction.DisplayName = "DocumentClassificationAction"; docClassAction.Save(RefreshMode.REFRESH); }
You can get a single DocumentClassificationAction object with a Factory.DocumentClassificationAction method. You can also get a collection of DocumentClassificationAction
objects
(DocumentLifecycleActionSet)
by retrieving the DocumentLifecycleActions property on an ObjectStore
object.
The following Java and C# examples show how to retrieve a DocumentLifecycleActionSet
collection from an object store. The examples iterate the set, and, for each DocumentClassificationAction
object in the collection, the examples retrieve the object's MimeType, ProgId, and CodeModule properties. Note that a document classifier referenced by a DocumentClassificationAction
object may not be contained within a CodeModule
stored in an object store. This is the case, for example, with the document classification action packaged with the Content Engine, XML Classifier. (For more information on XML Classifier, see Understanding the XML Classifier.)
Java Example
... DocumentClassificationActionSet actionSet = os.get_DocumentClassificationActions(); DocumentClassificationAction actionObject; Iterator iter = actionSet.iterator(); while ( iter.hasNext() ) { actionObject = (DocumentClassificationAction)iter.next(); System.out.println("DocumentClassificationAction: " + actionObject.get_DisplayName() + "\n MimeType is " + actionObject.get_MimeType() + "\n ProgId is " + actionObject.get_ProgId() ); String cmName = actionObject.get_CodeModule() != null ? actionObject.get_CodeModule().getProperties().getStringValue("Name") : "not assigned to this action"; System.out.println(" CodeModule is " + cmName); } }
C# Example
... IDocumentClassificationActionSet actionSet = os.DocumentClassificationActions; IDocumentClassificationAction actionObject; System.Collections.IEnumerator iter = actionSet.GetEnumerator(); while (iter.MoveNext()) { actionObject = (IDocumentClassificationAction)iter.Current; System.Console.WriteLine("IDocumentClassificationAction: " + actionObject.DisplayName + "\n MimeType is " + actionObject.MimeType + "\n ProgId is " + actionObject.ProgId ); String cmName = actionObject.CodeModule != null ? actionObject.CodeModule.Properties.GetStringValue("Name") : "not assigned to this action"; System.Console.WriteLine(" CodeModule is " + cmName); } }
You can automatically classify documents with MIME types for which a classification infrastructure has been previously set up. That is, for a particular MIME type, a corresponding document classifier and a DocumentClassificationAction
object must exist.
The following Java and C# examples show how to submit a document of MIME type "text/pdf" for automatic classification. The previous sections in this topic include code examples of a document classifier and a DocumentClassificationAction object that support this MIME type.
In the examples, a Document
object is created for a PDF document, and the object's properties are set, most notably the ContentElements property and the MimeType property. The ContentElements property is set to the PDF content of the document, and this content will later be parsed by the document classifier. Note that the value of the Document
object's MimeType property must match the value of the DocumentClassificationAction
object's MimeType property. The Document
object is then checked in, with the checkin method specifying the AUTO_CLASSIFY constant.
The examples also include code to monitor the classification process by reading the checked-in document's ClassificationStatus property, which is set to a DocClassificationStatus constant. When an auto-classification request is made, the initial ClassificationStatus value is CLASSIFICATION_PENDING. The code repeatedly checks the status until the property's value changes.
Note that because a document classifier runs as an asynchronous action, an auto-classification request is initially queued, and represented by a DocumentClassificationQueueItem object. This queued state corresponds with the CLASSIFICATION_PENDING status.
Java Example
... Document doc = Factory.Document.createInstance(os, "Document"); FileInputStream fileIS = new FileInputStream("C:\\EclipseWorkspace\\Documents\\loanapplication.pdf"); // Create content transfer list ContentTransferList contentList = Factory.ContentTransfer.createList(); ContentTransfer ctNew = Factory.ContentTransfer.createInstance(); ctNew.setCaptureSource(fileIS); contentList.add(ctNew); // Set content on Document object doc.set_ContentElements(contentList); // Set Document properties doc.getProperties().putValue("DocumentTitle", "PDF Document"); doc.set_MimeType("text/pdf"); // Check in document and commit to server doc.checkin(AutoClassify.AUTO_CLASSIFY, CheckinType.MAJOR_VERSION); doc.save(RefreshMode.REFRESH); // Check classification status during auto classify while (doc.get_ClassificationStatus() == DocClassificationStatus.CLASSIFICATION_PENDING) { System.out.println( "Classification status is " + doc.get_ClassificationStatus() ); doc.refresh(); } System.out.println("Classification status is " + doc.get_ClassificationStatus() ); }
C# Example
... IDocument doc = Factory.Document.CreateInstance(os, "Document"); Stream fileStream = File.OpenRead(@"C:\\EclipseWorkspace\\Documents\\loanapplication.pdf"); // Create content transfer list IContentTransferList contentList = Factory.ContentTransfer.CreateList(); IContentTransfer ctNew = Factory.ContentTransfer.CreateInstance(); ctNew.SetCaptureSource(fileStream); contentList.Add(ctNew); // Set content on Document object doc.ContentElements = contentList; // Set Document properties doc.Properties["DocumentTitle"] = "PDF Document"; doc.MimeType = "text/pdf"; // Check in document and commit to server doc.Checkin(AutoClassify.AUTO_CLASSIFY, CheckinType.MAJOR_VERSION); doc.Save(RefreshMode.REFRESH); // Check classification status during auto classify while (doc.ClassificationStatus == DocClassificationStatus.CLASSIFICATION_PENDING) { System.Console.WriteLine("Classification status is " + doc.ClassificationStatus); doc.Refresh(); } System.Console.WriteLine("Classification status is " + doc.ClassificationStatus); }