This package builds a deterministic finite automaton (DFA) for XML schema models. The process used is described in Compilers: Principles, Techniques, and Tools, 1988, Aho et. al.

A DFA is a state transition model that diagramatically looks like the following:

'po:purchaseOrder'
(Start)--"billTo"-->(1)--"shipTo"-->(2)--+--"items"-->(Finish1)
                                         '--"po:comment"-->(4)--"items"-->(Finish2)
'billTo'
(Start)--"first_name"-->(1)--"last_name"-->(2)--"city"-->(3)--"state"-->(4)--"zip"-->(Finish)

'shipTo'
(Start)--"first_name"-->(1)--"last_name"-->(2)--"city"-->(3)--"state"-->(4)--"zip"-->(Finish)

'items'
(Start)--"item"-->(Finish)--"item"--.
                      ^             |
                      '-----<<------'
Each element (whether global or local) is uniquely identified. All the element tagnames allowed within the parent defines an 'alphabet' of valid content. Grand-children are irrelevant to the state transitions; the data structure can be built independently for each level in the schema model.

Two properties of a DFA are that every state has exactly one non-empty entry, and there are no state transitions on the empty input. Hence "deterministic" in the name, for any given state and input, there are either 0 or 1 state transitions possible.

A state transition matrix is created by treating the schema as a regular expression according to the following mapping, where e is an element in a complex type and a is an attribute:

The data structure will identify the schema content model component that it is using, so the model walkers can determine the nested choices, sequences, or alls that contain the element. Performance (once the data structure is obtained) should be extremely good, because you'll essentially be banging deterministically (only one valid path) through an int[] using int offsets.

Additional uses:

Package Specification