The essential parts of a Content Engine search are a SQL statement, contained in a SearchSQL
instance, and the object store or object stores searched, contained in a SearchScope
object. Content searches are specified through the CONTAINS and FREETEXT operators in the SQL statement.
There are helper methods on the SearchSQL
class to assist you in constructing a SQL statement. Alternatively, you can construct a SQL statement independently and pass it to a SearchSQL
instance as a string. SQL statements need to follow the FileNet standard, which generally conform to SQL-92, with extensions for FileNet-specific constructs. See SQL Syntax Reference for a complete description.
The SearchSQL helper methods are supplied for assistance in building SQL statements, and cannot provide the level of specification you can achieve with an independently-constructed statement. However, in a development environment, you can use the helper methods for initial construction of the SQL statement, then use the SearchSQL.toString
method to get the SQL statement string and manually refine the SQL statement.
The SearchScope
methods execute the SQL statement on one or more object stores to find objects (IndependentObject
instances), database rows (RepositoryRow
instances), or metadata (ClassDescription
instances).
You can use the SearchScope
class to search one or more object stores using a single query. To issue a query on multiple object stores, call the constructor for SearchScope
with an array of object stores, similar to the following:
ObjectStore[] osArray =new ObjectStore[]{os1,os2}; SearchScope objStores = new SearchScope(osArray, MergeMode.INTERSECTION);
Then use the SearchScope
instance (objStores) to execute a query. The query will merge the results from the object stores, and return them in a single, ordered list.
For example, if the search statement "SELECT DocumentTitle FROM Document WHERE DocumentTitle LIKE 'C%' ORDER BY DocumentTitle", is executed on a list of two object stores, the results (in a single collection) might be:
Cars City Concrete Cows
"Cars" and "Concrete" might come from the first object store, and "City" and "Cows" might come from the second object store. Note that the results from the different object stores are intermingled in the list, ordered by the ORDER BY clause of the search statement.
Classes and properties are defined in each object store. A class or property in one object store is considered to be the same class or property existing in another object store only if the compared classes or properties have matching GUIDs. Having the same name does not indicate that the compared classes or properties the same.
GUID values are stored in properties on both the ClassDefinition
and PropertyDefinition
classes.
A ClassDefinition
object has both an Id property that is a GUID, and an AliasIds property that is a list of GUIDs. The Id property contains the GUID that is usually used to identify ClassDefinition
objects. The AliasId properties can alternatively be used to identify these objects. Two ClassDefinition
objects from two different object stores are considered to be the same if the value of either the Id property or AliasId property of one ClassDefinition
object matches the value of the corresponding property on the other ClassDefinition
object.
For example, the query "SELECT * from DocSubClass" executed on a list of two object stores might return objects named "DocSubClass" from both object stores, but if these objects do not have the same Id or AliasId property value, they will not be recognized as the same object. Attempting to query both object stores using the name "DocSubClass" will not return any rows from the second object store. (However, the object named "DocSubClass" in the second object store can be referenced using the string format of the ClassDefinition.Id
property, rather than the name.)
PropertyDefinition
objects have Id, PrimaryId, and AliasId
properties. For PropertyDefinition
objects, the
PrimaryId property is used to identify the object, rather than the Id property. (Note that the PrimaryId property is the same as the Id property of the PropertyTemplate
object to which the property refers.) Two PropertyDefinition
objects from two different object stores are considered to be the same then if either the PrimaryId or AliasId property value of one PropertyDefinition
object matches the value of the corresponding property on the other PropertyDefinition
object, when both PropertyDefinition
objects are on ClassDefinition
objects that also match.
The AliasId properties for both ClassDefinition
and PropertyDefinition
objects are cumulative. For instance, suppose four objects are to be merged from object stores A, B, C, D, with the Id and AliasId values shown below (using single digit integers for brevity):
Object Store | Class Id | Alias Id | Ids of Class |
---|---|---|---|
OS-A | 1 | 2 | 1,2 |
OS-B | 2 | 3 | 1,2,3 |
OS-C | 3 | 4 | 1,2,3,4 |
OS-D | 4 | (none) | 1,2,3,4 |
The column "Ids of Class" indicate the cumulative object GUIDs, and if matched by any Id or AliasId of another object, will result in the merging of the two objects for the purposes of the query. So, all of the objects in the table are aliased together as the same object. Note that this example illustrates how IDs are matched; a class alias scheme this complex in a real deployment is unlikely.
The typical aliasing scheme is:
Object Store | Class Id | Alias Id | Ids of Class |
---|---|---|---|
OS-A | 1 | (none) | |
OS-B | 2 | 1 | |
OS-C | 3 | 1 | |
OS-D | 3 | 1 |
Duplicate matches are not allowed for alias IDs, which means that a single object cannot match more than one other object, and a single property cannot match more than one other property. If alias IDs are set up so that duplicate matches occur, an exception will be thrown and the multiple object store query will not be allowed for any objects across that combination of object stores (not just the objects that contain the duplicate alias IDs).
The system administrator should normally create the classes and properties on one object store, and then export those definitions from that object store and import them to any other object store that needs to support queries across object stores. This export/import operation will insure that the IDs of the classes and properties are the same in each object store. The imported names will also be the same, which is a good practice to follow.
If the object stores that must support queries across object stores contain pre-existing objects with different IDs, then the alias IDs must be used as the alternate identifier. In this case, the system administrator must assign alias IDs to the intended matching objects and properties on each object store. When assigning alias IDs, the ClassDefinition.Id
property of an object in one object store is assigned to the AliasId list of that object in another object store. Additionally, the PropertyDefinition.PrimaryId
property of a property in one object store is assigned to the AliasId list of the property in another object store.
Note: If two object stores need matching objects, the alias IDs for the corresponding objects need to be assigned on only one of the two object stores.
When determining names of classes or properties, it is the first object store in which the class is encountered that determines the name. For example, suppose there is an object named "apple" in the first object store, and "orange" in a second object store, and that both objects have the same GUID value for their Id property. For the purpose of an object store query that executes across both object stores, any reference to the object having the name "apple" would match both the "apple" and "orange" objects. Any name reference to the object having the name "orange" would throw an undefined class exception.
Since the search order of the object stores can affect name-based queries, the same object store order should be used whenever performing queries across object stores, because doing so is more efficient. Merging object stores A and B does not produce the same results as merging B and A. So the server must cache merged object store metadata that is order dependent (B & A and also A & B), and changing the order for one query vs. the next can cause excessive amounts of metadata to be cached, resulting in either too much memory being cached, or thrashing due to metadata being flushed from the cache to restrict size and then reconstituted later.
The merge mode specified for a query across object stores affects how classes and properties are merged. There are two merge modes: intersection and union (MergeMode.INTERSECTION and MergeMode.UNION).
For an intersection merge, only objects and properties defined in all object stores are present in the merged metadata, and only these objects and properties may be referenced in a search. Any class or property that exists in one object store, but does not have a matching class or property in every object store, is excluded from the merged metadata, and cannot be used in a search.
For a union merge, all classes and properties from all object stores are present in the merged metadata, and all classes and properties can be returned.
As an example, assume the following:
(Note that OS1 is the first object store in the collection.) The following custom properties then exist for "Alpha" in each object store:
If you specify MergeMode.UNION, the properties returned are:
If you specify MergeMode.INTERSECTION, the properties returned are:
Attempts to select either PropertyA or PropertyD will result in an undefined property exception.
If the classes had the same GUIDs for the same names, but the properties had different GUIDs and were not aliased, the MergeMode.UNION for the above example would have the following properties:
If you executed the select statement "SELECT * FROM Alpha", the result would be a row with ten columns for each object store that contained a row. Each column in the rows returned would be non-null only if the row came from the preceding object store in the list.
If the select statement was "SELECT PropertyA, PropertyB, PropertyC, PropertyD FROM Alpha", PropertyA would come only from OS1 and would be null for rows from any other object store. Similarly, PropertyB would come only from OS1, PropertyC from OS1, and PropertyD from OS2. You could not select just PropertyB from OS3 based on the property name, so this configuration is not very useful, illustrating why you need to put alias IDs on properties (or export/import across object stores to make the IDs match); otherwise, the query results might not be meaningful.
For queries across object stores, when a property having the same GUID does not have the same name in each object store, the type of objects returned will affect the property name: If RepositoryRow objects are returned, the property gets the name from the first object store in which it is defined, and the name is the same for rows from any subsequent object store in the list. If IndependentObject objects are returned, the property will be named according to each object store in which it is defined.
RepositoryRow
objects differ from IndependentObject objects in some notable ways:
RepositoryRow
object cannot be used for updates.RepositoryRow
object can have data from multiple IndependentObject objects if joins are used in the query.RepositoryRow
object can have duplicate properties.As an example, suppose you execute the statement "SELECT apple FROM someclass" against a list of two objects stores; where, in the first object store, the property "apple" matches (by a GUID) a property named "orange" in a second object store. A query that returned RepositoryRow objects will always return properties named "apple", regardless of which object store they came from, but a query that returned IndependentObject objects will return a property name of "apple" for data from the first object store and will return a property name of "orange" for data returned from the second object store. If this was not the case, attempts to do updates using the IndependentObject objects returned from the second object store would generate the error "Property apple not defined".
When RepositoryRow
objects are returned, the names of properties can be renamed. For instance, you could call SearchScope.fetchRows
, then execute "SELECT Owner AS Bob FROM Document" on the search results. In the results, each RepositoryRow
object would then have a property named "Bob".
However, you cannot use the AS clause when returning IndependentObject
objects. IndependentObject
objects can be used as a subsequent update, so, in the preceding RepositoryRow
example, there probably would be no property named "Bob" for the update, nor would it be useful to try to update a (possible) property named "Bob" using "Owner".
Content (full-text) searches include in the query words or phrases that might be stored in objects, or in string properties of these objects. For the content in an object or its string properties to be searched, you must enable content-based retrieval (CBR) for the object and any of its string properties to be included in a content search. This is controlled by the (boolean) value of the IsCBREnabled
property on the following classes:
ClassDefinition
The IsCBREnabled
property enables full-text searches of content (if any exists) for the class, and allows string properties to be enabled for full-text searches.
PropertyDefinitionString
The IsCBREnabled
property enables the string property to be included in content searches.
The IsCBREnabled
property can be enabled only for Document
, Annotation
, CustomObject
, and Folder
objects.
A content search is initiated by either a CONTAINS or FREETEXT operator in the SQL statement contained in SearchSQL
. The CONTAINS and FREETEXT operators have somewhat different operand formats and provide somewhat different search characteristics.
While the CONTAINS operator can search content in all properties, or in a single property, the FREETEXT operator can search content only in all properties. Note that attempting to specify a property name for FREETEXT will generate an exception.
See Full-Text Queries for more information about the CONTAINS and FREETEXT operators, and Content-Based Retrieval Concepts for information about administrative interfaces for full-text information.
Note Full-text queries can take a considerable amount of time to execute. Some queries can finish in a few seconds, while others could potentially run for hours. Your applications should be written to allow the user to set a timeout; a single default value is probably not sufficient. The user setting(s) should ensure that either the query does not run longer than desired, or that the timeout value is high enough to enable the query to finish execution. Note that the timeout value is the time required to fetch a page for a continuable query, not the time to fetch all pages for the query.