Collection Concepts

Content Engine collections are groups of related elements and can be one of two types: list or set. The name of a collection identifies its type. For example, a DocumentSet is a collection of Document objects and an IdList is a collection of Id objects.

All Content Engine API collection objects are strongly typed. When returned by the API, these type-aware collection objects contain only elements whose object type is directly related to the object type of the collection. For example, a DocumentSet collection object returned by the Java™ Folder.get_ContainedDocuments() method or returned from the .NET IFolder.ContainedDocuments property contains only those Document objects (and any subclassed objects of those documents) contained within that folder.

Note It is currently possible, in many cases, for applications to intentionally circumvent the type safety (by putting objects of inappropriate types into a collection or casting one collection type to another collection type. Applications should not rely on such behavior, and it is likely that type safety will be strengthened in a future release to prevent it.

The EngineCollection interface is the base interface for all collection types in the collection class hierarchy and provides functionality common to all collection objects. Its subinterface, EngineSet, provides functionality common only to sets. In other words, a list-type collection has all of the functionality provided by EngineCollection, but not of EngineSet, whereas a set-type collection has all of the functionality provided by both EngineCollection and EngineSet.

A list-type collection is a group of dependent objects or a list of primitive data items. A list-type collection has a parent object to which it is scoped. A list-type collection object is instantiated with the createList() method on the type-specific Factory class. For example, to create an instance of IdList, call Factory.IdList.createList(). The elements of a list are ordered and need not be unique. List-type collections are iterated one element at a time. You can directly update a list-type collection using type-safe methods, for example, to add or remove elements.

Note: Adding or removing elements from a list while there is an open iterator may have undefined effects as regards further progress through that iterator. Doing so may result in errors, skipped or repeated items, or may work as you expect. Because of this unpredictable behavior, we recommend that you close the open iterator before adding or removing elements.

A set-type collection is a group of independent objects. With some exceptions, the elements of a set are unordered and unique. The exceptions to this general rule are the following:

You cannot directly update a set-type collection. Set-type collections can be paged, that is, they can be iterated a page at a time. (See Collection Paging Support below, and the Java PageIterator and PageMark interfaces or .NET IPageEnumerator and IPageMark, which provide paging functionality.) A row set is a collection of rows returned from a query and has the characteristics of a set-type collection. (See the RepositoryRowSet interface.)

Note When iterating a set, there is no shortcut way for your application to discover the absolute total of items in the collection. If your application needs to know the total items in the collection, it must count the individual items in the set (or in each page, if employing collection paging) and calculate the total.

Collection Paging Support

In addition to traditional item-at-a-time iteration, the Content Engine APIs support paging of set-type collections. Paging is automatically employed for enumeration results sent from the server to the client. You can also use paging for your custom applications. As an example, you could write your application to specify a page size that is approximately the number of items displayed in a user interface. Each page retrieved is rendered for UI presentation while the next page of results is being retrieved.

How Paging Works

Sets of independent objects and repository rows are divided into pages when they are being physically retrieved from the server; each page is a number of collection elements (objects or rows) that represent a subset of the collection elements. You can iterate a page at a time instead of one object or row at a time. As an example, if a page is defined as 10 elements, and the collection has a total of 22 elements, the first paging operation returns a page containing 10 elements, the second page returns the next 10 elements, and the third page returns the last 2 elements. This page iteration is exposed and is especially useful for interactive applications that display a page of information at a time.

Each page iterator is initially positioned before the first page of the set. The first call to the PageIterator.nextPage method moves the iterator to the first page. The second call to nextPage moves the iterator to the second page, and so on. The nextPage method returns true until the end of the set is reached. When the iterator reaches the end of the set, it is positioned after the last page and nextPage returns false.

The getCurrentPage and getElementCount methods throw an exception if the iterator is positioned before the first page or after the last page, or between pages after a reset(mark) operation. For proper positioning, you must call nextPage on a new iterator and after a reset operation. You may call the getPageMark method at any time. Of these methods, only nextPage moves the position of the iterator.

The returned value of getElementCount is always equal to getCurrentPage().length. Use getElementCount to avoid copying the potentially large internal array just to get its length.

You can also get the current page continuation state (that is, the page on which the iterator will continue with the next call) and reset the iterator back to a previous page of results. The saved position of the iterator is called a "page mark", represented by a PageMark object. The PageIterator.getPageMark method retrieves the current mark, and the reset(mark) method resets the state of the iterator to a previously saved mark. The reset method positions the iterator before the marked page; the nextPage method must be called to position the iterator to the marked page. It is also possible to mark and reset to the position before the first page and the position after the last page.

The reset method (with no parameters) positions the iterator before the first page of the collection. This is essentially the same as getting a new iterator from the collection. You must then call the nextPage method to position the iterator to the first page.

The getPageSize and setPageSize methods allow you to query and adjust the internal paging size of the iterator. The new size takes effect on the next fetch of a page from the server. This is typically on the next call to nextPage. The actual size of each returned page may be smaller (including zero) or larger than the requested page size. If you do not specify a page size on these calls, the configured defaults are used. The ServerCacheConfiguration.QueryPageDefaultSize property specifies the default page size for query results and paged property values; if this property is not explicitly set to some other value, its default is 500. If you specify a page size, it must be less than the configured maximum page size set in the QueryPageMaxSize property (which has a system default value of 1000.)

The first page of a set may be pre-fetched from the server and cached in the client. All iterators of a set with a pre-fetched first page could return the same first page. All iterators fetch subsequent pages, if any, directly from the server.

For information on how to page through a collection, see Working with Collections.