Working with Document Classification-related Objects

To enable documents of a specified MIME type to be automatically classified, you need the following:

You can also retrieve DocumentClassificationAction objects.

To submit a document for classification and to view its classification status, you must check in a document with auto-classification enabled. See Autoclassifying a Document.

For an overview of automatic document classification, see Document Classification Concepts.

Creating a Document Classifier

To create a document classifier, you must implement the classify method of the Java™ DocumentClassifier interface. This method is passed the new checked-in Document object. The purpose of this method is to determine the Content Engine class to which the Document object belongs, and then to apply the class to the object. Typically, this involves parsing the content of the Document object and mapping metadata from the content to properties of the Content Engine class.

The following sample implementation classifies documents of mime type "text/pdf". Retrieving ContentTransfer data from the new Document object, the classify method uses a third-party API to parse the PDF content. It tests the subject field of the PDF content. If the subject indicates that the PDF document is a loan application, then the method uses the changeClass method to apply the "PdfLoanApplication" class to the Document object. Also, the method maps metadata from the PDF content to properties of the "PdfLoanApplication" class. If the PDF document is not a loan application, then the default class of the Document object is maintained.

For best practices in implementing and packaging a DocumentClassifier interface, see Implementation Concepts. To view a sample source code DocumentClassifier implementation packaged with the Content Engine, go to this Content Engine directory: <drive>:/Program Files/Filenet/Content Engine/samples. The DocumentClassifier implementation is DCAHandler.java.

Java Example

package sample.actionhandler;
import com.filenet.api.core.*;
import com.filenet.api.engine.DocumentClassifier;
import com.filenet.api.exception.*;
import java.io.*;
import com.ticdoc.pdfextract.*; // 3rd-party API for parsing PDF documents

public class DocClassifyHandler implements DocumentClassifier
{
    public void classify(Document doc)
    {
        try
        {
            // Get PDF content from new Document checked into object store
            ContentTransfer ct = (ContentTransfer) doc.get_ContentElements().get(0);
            InputStream IS = ct.accessContentStream();

            // Use 3rd-party API to get PDF document metadata
            PDFDocument pdfDoc = PDFDocument.load(IS);
            PDFDocumentInformation pdfProperties = pdfDoc.getDocumentInformation();
            pdfDoc.close();
            
            // Get subject of PDF document
            String pdfSubject = pdfProperties.getSubject();
            
            // Classify based on PDF subject
            if ( pdfSubject.equalsIgnoreCase("loan application") )
            {
                // Apply new class
                doc.changeClass("PdfLoanApplication");

                // Get PDF properties to be mapped to Document in object store
                String pdfloanType = pdfProperties.getLoanType();
                String pdfApplicantName = pdfProperties.getApplicant();
                String pdfDateSubmitted = pdfProperties.getModificationDate().getTime().toString();

                // Set properties for Document stored in object store
                doc.getProperties().putValue("LoanType", pdfloanType);
                doc.getProperties().putValue("ApplicantName", pdfApplicantName);
                doc.getProperties().putValue("ApplicationDate", pdfDateSubmitted);
                doc.getProperties().putValue("DocumentTitle", "PDF Loan Application");

                // Set security owner based on loan type
                if ( pdfloanType.equalsIgnoreCase("home loan application") )
                    doc.set_Owner("GEvans");
                else if (pdfloanType.equalsIgnoreCase("auto loan application") )
                    doc.set_Owner("EMesker");
            }
        }
        catch(Exception e)
        {
            ErrorRecord er[] = {new ErrorRecord (e)};
            throw new EngineRuntimeException(e, ExceptionCode.CLASSIFY_HANDLER_THREW, er);
        }
     }
}

Creating a DocumentClassificationAction Object

A DocumentClassificationAction object identifies the document classifier to be launched when a document is checked in with auto-classification enabled. The following Java and C# code snippets show how to create a DocumentClassificationAction object and set its properties. The MimeType property associates the DocumentClassificationAction object with documents of the same MIME type. This property is set to "text/pdf". So when documents of this MIME type are checked in with auto-classification enabled, then the document classifier associated with this DocumentClassificationAction object will be launched.

A document classifier is associated with a DocumentClassificationAction through the ProgId and, conditionally, CodeModule properties. You must set the ProgId property with the fully qualified name of the document classifier. If, as shown in the examples, the document classifier is contained within a CodeModule stored in an object store, you must also get the CodeModule object, then assign it to the CodeModule property of the DocumentClassificationAction object. Note that you cannot set the CodeModule property to a reservation (in progress) version of CodeModule. For more information, see Creating a CodeModule Object.

NOTE  Do not set the CodeModule property if you set the application server's class path to the location of the document classifier.

When saved, a DocumentClassificationAction object is stored in the Document Classification Actions folder of a Content Engine object store.

Java Example

...
   // Create document classification action
   DocumentClassificationAction docClassAction = Factory.DocumentClassificationAction.createInstance(os, 
              ClassNames.DOCUMENT_CLASSIFICATION_ACTION);
			  
   // Set MIME type that associates action to documents of same MIME type
   docClassAction.set_MimeType("text/pdf");
   
   // Set ProgId property with fully qualified name of classifier
   docClassAction.set_ProgId("sample.actionhandler.DocClassifyHandler");
   
   // Get CodeModule object
   CodeModule cm = Factory.CodeModule.getInstance( os, 
              ClassNames.CODE_MODULE, new Id("{C45954D4-5DBB-460B-B890-78D6F4CFA40B}") ); 
   // Set CodeModule property
   docClassAction.set_CodeModule(cm);
   
   docClassAction.set_DisplayName("DocumentClassificationAction");
   docClassAction.save(RefreshMode.REFRESH);
}

C# Example

...
   // Create document classification action
   IDocumentClassificationAction docClassAction = Factory.DocumentClassificationAction.CreateInstance(os, 
              ClassNames.DOCUMENT_CLASSIFICATION_ACTION);
			  
   // Set MIME type that associates action to documents of same MIME type
   docClassAction.MimeType = "text/pdf";

   // Set ProgId property with fully qualified name of classifier
            docClassAction.ProgId = "sample.actionhandler.DocClassifyHandler";

   // Get CodeModule object
   ICodeModule cm = Factory.CodeModule.GetInstance( os,
               ClassNames.CODE_MODULE, new Id("{C45954D4-5DBB-460B-B890-78D6F4CFA40B}")); 
   // Set CodeModule property
   docClassAction.CodeModule = cm;

   docClassAction.DisplayName = "DocumentClassificationAction";
            docClassAction.Save(RefreshMode.REFRESH);
}

Retrieving DocumentClassificationAction Objects

You can get a single DocumentClassificationAction object with a Factory.DocumentClassificationAction method. You can also get a collection of DocumentClassificationAction objects (DocumentLifecycleActionSet) by retrieving the DocumentLifecycleActions property on an ObjectStore object.

The following Java and C# examples show how to retrieve a DocumentLifecycleActionSet collection from an object store. The examples iterate the set, and, for each DocumentClassificationAction object in the collection, the examples retrieve the object's MimeType, ProgId, and CodeModule properties. Note that a document classifier referenced by a DocumentClassificationAction object may not be contained within a CodeModule stored in an object store. This is the case, for example, with the document classification action packaged with the Content Engine, XML Classifier. (For more information on XML Classifier, see Understanding the XML Classifier.)

Java Example

...
   DocumentClassificationActionSet actionSet = os.get_DocumentClassificationActions();
   DocumentClassificationAction actionObject;
   Iterator iter = actionSet.iterator();
   while ( iter.hasNext() ) 
   {
      actionObject = (DocumentClassificationAction)iter.next();
      System.out.println("DocumentClassificationAction: " + 
            actionObject.get_DisplayName() +
            "\n  MimeType is " + actionObject.get_MimeType() +
            "\n  ProgId is " + actionObject.get_ProgId() );
      String cmName = actionObject.get_CodeModule() != null ?
            actionObject.get_CodeModule().getProperties().getStringValue("Name") :
            "not assigned to this action";
      System.out.println("  CodeModule is " + cmName);
   }
}

C# Example

...
   IDocumentClassificationActionSet actionSet = os.DocumentClassificationActions;
   IDocumentClassificationAction actionObject;
   System.Collections.IEnumerator iter = actionSet.GetEnumerator();
   while (iter.MoveNext())
   {
      actionObject = (IDocumentClassificationAction)iter.Current;
      System.Console.WriteLine("IDocumentClassificationAction: " + 
            actionObject.DisplayName +
            "\n  MimeType is " + actionObject.MimeType +
            "\n  ProgId is " + actionObject.ProgId );
      String cmName = actionObject.CodeModule != null ?
            actionObject.CodeModule.Properties.GetStringValue("Name") :
            "not assigned to this action";
      System.Console.WriteLine("  CodeModule is " + cmName);
   }
}

Autoclassifying a Document

You can automatically classify documents with MIME types for which a classification infrastructure has been previously set up. That is, for a particular MIME type, a corresponding document classifier and a DocumentClassificationAction object must exist.

The following Java and C# examples show how to submit a document of MIME type "text/pdf" for automatic classification. The previous sections in this topic include code examples of a document classifier and a DocumentClassificationAction object that support this MIME type.

In the examples, a Document object is created for a PDF document, and the object's properties are set, most notably the ContentElements property and the MimeType property. The ContentElements property is set to the PDF content of the document, and this content will later be parsed by the document classifier. Note that the value of the Document object's MimeType property must match the value of the DocumentClassificationAction object's MimeType property. The Document object is then checked in, with the checkin method specifying the AUTO_CLASSIFY constant.

The examples also include code to monitor the classification process by reading the checked-in document's ClassificationStatus property, which is set to a DocClassificationStatus constant. When an auto-classification request is made, the initial ClassificationStatus value is CLASSIFICATION_PENDING. The code repeatedly checks the status until the property's value changes.

Note that because a document classifier runs as an asynchronous action, an auto-classification request is initially queued, and represented by a DocumentClassificationQueueItem object. This queued state corresponds with the CLASSIFICATION_PENDING status.

Java Example

...
   Document doc = Factory.Document.createInstance(os, "Document");
   FileInputStream fileIS = new FileInputStream("C:\\EclipseWorkspace\\Documents\\loanapplication.pdf");

   // Create content transfer list
   ContentTransferList contentList = Factory.ContentTransfer.createList();
   ContentTransfer ctNew = Factory.ContentTransfer.createInstance();
   ctNew.setCaptureSource(fileIS);
   contentList.add(ctNew);

   // Set content on Document object
   doc.set_ContentElements(contentList);
   
   // Set Document properties
   doc.getProperties().putValue("DocumentTitle", "PDF Document");
   doc.set_MimeType("text/pdf");
   
   // Check in document and commit to server
   doc.checkin(AutoClassify.AUTO_CLASSIFY, CheckinType.MAJOR_VERSION);
   doc.save(RefreshMode.REFRESH);

   // Check classification status during auto classify
   while (doc.get_ClassificationStatus() == DocClassificationStatus.CLASSIFICATION_PENDING)
   {
      System.out.println( "Classification status is " + doc.get_ClassificationStatus() );
      doc.refresh();
   }
   System.out.println("Classification status is " + doc.get_ClassificationStatus() );
}

C# Example

...
   IDocument doc = Factory.Document.CreateInstance(os, "Document");
   Stream fileStream = File.OpenRead(@"C:\\EclipseWorkspace\\Documents\\loanapplication.pdf");

   // Create content transfer list
   IContentTransferList contentList = Factory.ContentTransfer.CreateList();
   IContentTransfer ctNew = Factory.ContentTransfer.CreateInstance();
   ctNew.SetCaptureSource(fileStream);
   contentList.Add(ctNew);

   // Set content on Document object
   doc.ContentElements = contentList;
   
   // Set Document properties
   doc.Properties["DocumentTitle"] = "PDF Document";
   doc.MimeType = "text/pdf";

   // Check in document and commit to server
   doc.Checkin(AutoClassify.AUTO_CLASSIFY, CheckinType.MAJOR_VERSION);
   doc.Save(RefreshMode.REFRESH);

   // Check classification status during auto classify
   while (doc.ClassificationStatus == DocClassificationStatus.CLASSIFICATION_PENDING)
   {
      System.Console.WriteLine("Classification status is " + doc.ClassificationStatus);
      doc.Refresh();
   }
   System.Console.WriteLine("Classification status is " + doc.ClassificationStatus);
}