Creating a parser in C

Before you start

Ensure that you have read and understood the following topics:

A loadable implementation library, or a LIL, is the implementation module for a C parser (or node). A LIL is a UNIX shared object or Windows dynamic link library (DLL), that does not have the file extension .dll but .lil.

The implementation functions that have to be written by the developer are listed in Parser implementation functions. The utility functions that are provided by WebSphere Business Integration Message Broker to aid this process are listed in Parser utility functions.

WebSphere Business Integration Message Broker provides the source for a sample user-defined parser called BipSampPluginParser.c. This is a simple pseudo-XML parser that you can use in its current state, or you can modify.

The task of writing a parser varies considerably according to the complexity of the bitstream to be parsed. Only the basic steps are described here. They are described in the following sections:
  1. Defining the parser during broker initialization
  2. Creating an instance of the parser
  3. Implementing the parser functionality
  4. Deleting an instance of the user-defined parser

Implementing input functions

The input functions (for example, cpiParseBuffer) are invoked by the broker when a parser is required to parse an input message. The parser must tell the broker how much of the input bitstream buffer that it claims to own. In the case of a fixed-size header, the parser claims the size of the header. If the parser is intended to handle the whole message, it claims the remainder of the buffer.

For example:
  1. Invoke the cpiParseBufferEncoded function:
    int cpiParseBufferEncoded(
      CciParser*  parser,
      CciContext* context,
      int         encoding,
      int         ccsid
    ){
      PARSER_CONTEXT_ST* pc = (PARSER_CONTEXT_ST *)context ;
      int                rc;
    
  2. Get a pointer to the message buffer and set the offset using the cpiBufferPointer function:
      pc->iBuffer = (void *)cpiBufferPointer(&rc, parser);
      pc->iIndex = 0;
  3. Save the format of the buffer:
      pc->iEncoding = encoding;
      pc->iCcsid = ccsid;
  4. Save the size of the buffer using the cpiBufferSize function:
      pc->iSize = cpiBufferSize(&rc, parser);
  5. Prime the first byte in the stream using the cpiBufferByte function:
      pc->iCurrentCharacter = cpiBufferByte(&rc, parser, pc->iIndex);
  6. Set the current element to the root element using the cpiRootElement function:
      pc->iCurrentElement = cpiRootElement(&rc, parser);
  7. Reset the flag to ensure parsing is reset correctly:
      pc->iInTag = 0;
    
  8. Claim ownership of the remainder of the buffer:
      return(pc->iSize);
    }

Implementing parse functions

General parse functions (for example, cpiParseFirstChild) are those invoked by the broker when the syntax element tree needs to be created in order to evaluate an ESQL expression. For example, a filter node uses an ESQL field reference in an ESQL expression. This field reference must be resolved in order to evaluate the expression. Your parser's general parse function is called, perhaps repeatedly, until the requested element is either created or is known by the parser to not exist.

For example:
void cpiParseFirstChild(
  CciParser*  parser,
  CciContext* context,
  CciElement* element
){
  PARSER_CONTEXT_ST* pc = (PARSER_CONTEXT_ST *)context ;
  int                rc;

  if ((!cpiElementCompleteNext(&rc, element)) &&
      (cpiElementType(&rc, element) == CCI_ELEMENT_TYPE_NAME)) {

    while ((!cpiElementCompleteNext(&rc, element))     &&
           (!cpiFirstChild(&rc, element)) &&
           (pc->iCurrentElement))
    {
      pc->iCurrentElement = parseNextItem(parser, context, pc->iCurrentElement);
    }
  }
  return;
}

Implementing output functions

The output functions (for example, cpiWriteBuffer) are invoked by the broker when a parser is required to serialize a syntax element tree to an output bitstream. For example, a Compute node might have created a tree in the domain of your user-defined parser. When this tree needs to be output by, for example, an MQOutput node, the parser is responsible for appending the output bitstream buffer with data that represents the tree that has been built.

For example:
int cpiWriteBufferEncoded(
  CciParser*  parser,
  CciContext* context,
  int         encoding,
  int         ccsid
){
  PARSER_CONTEXT_ST* pc = (PARSER_CONTEXT_ST *)context ;
  int                initialSize = 0;
  int                rc = 0;
  const void* a;
  CciByte b;

  initialSize = cpiBufferSize(&rc, parser);
  a = cpiBufferPointer(&rc, parser);
  b = cpiBufferByte(&rc, parser, 0);

  cpiAppendToBuffer(&rc, parser, (char *)"Some test data", 14);

  return cpiBufferSize(0, parser) - initialSize;
}

Messages with multiple message formats

Normally, the incoming message data is of a single message format, so one parser is responsible for parsing the entire contents of the message. The class name of the parser that is needed is defined in the Format field in the MQMD or the MQRFH2 header of the input message.

However, the message might consist of multiple formats, for example where there is a header in one format followed by data in another format. In this case, the first parser has to identify the class name of the parser that is responsible for the next format in the chain, and so on. In a user-defined parser, the implementation function cpiNextParserClassName is invoked by the broker when it needs to navigate down a chain of parser classes for a message comprising multiple message formats.

If your user-defined parser supports parsing a message format that is part of a multiple message format, the user-defined parser must implement the cpiNextParserClassName function.

For example:
  1. Call the cpiNextParserClassName function:
    void cpiNextParserClassName(
      CciParser*  parser,
      CciContext* context,
      CciChar*    buffer,
      int         size
    ){
      PARSER_CONTEXT_ST* pc = (PARSER_CONTEXT_ST *)context ;
      int                rc = 0;
    
  2. Copy the name of the next parser class to the broker:
      CciCharNCpy(buffer, pc->iNextParserClassName, size);
    
      return;
    }

Defining the parser during broker initialization

The user-defined parser initialization function is invoked automatically during broker initialization. The user-defined parser is responsible for:
  • Creating and naming the message parser factory that is implemented by the user-defined parser. The parser factory is a container for related parser implementations. Parser factory names must be unique within a broker.
  • Defining the supported message parser class names, and supplying a pointer to a virtual function table that contains pointers to the user-defined parser implementation functions. Parser class names must be unique within a broker.

Each LIL that implements a user-defined parser must export a function called bipGetParserFactory as its initialization function. The initialization function defines the name of the factory that the user-defined parser supports and the classes of objects, or shared object, supported by the factory.

The initialization function must also create the factory object and define the names of all parsers supported by the LIL. A factory can support any number of object classes (parsers). When a parser is defined, a list of pointers to the implementation functions for that parser is passed to the broker. If a parser of the same name already exists, the request is rejected.

For example, to define the parser:
  1. Export the bipGetParserFactory initialization function:
    void LilFactoryExportPrefix * LilFactoryExportSuffix bipGetParserFactory()
    {
  2. Declare the variables:
     CciFactory*     factoryObject;
      int             rc;
      static CPI_VFT  vftable = {CPI_VFT_DEFAULT};
  3. Initialize all the static constants:
      initParserConstants();
  4. Setup function table with pointers to parser implementation functions:
      vftable.iFpCreateContext            = cpiCreateContext;
      vftable.iFpParseBufferEncoded       = cpiParseBufferEncoded;
      vftable.iFpParseFirstChild          = cpiParseFirstChild;
      vftable.iFpParseLastChild           = cpiParseLastChild;
      vftable.iFpParsePreviousSibling     = cpiParsePreviousSibling;
      vftable.iFpParseNextSibling         = cpiParseNextSibling;
      vftable.iFpWriteBufferEncoded       = cpiWriteBufferEncoded;
      vftable.iFpDeleteContext            = cpiDeleteContext;
      vftable.iFpSetElementValue          = cpiSetElementValue;
      vftable.iFpElementValue             = cpiElementValue;
      vftable.iFpNextParserClassName      = cpiNextParserClassName;
      vftable.iFpSetNextParserClassName   = cpiSetNextParserClassName;
      vftable.iFpNextParserEncoding       = cpiNextParserEncoding;
      vftable.iFpNextParserCodedCharSetId = cpiNextParserCodedCharSetId;

The initialization function must then create a parser factory by invoking cpiCreateParserFactory. The parser classes supported by the factory are defined by calling cpiDefineParserClass. The address of the factory object (returned by cpiCreateParserFactory) must be returned to the broker as the return value from the initialization function.

For example:
  1. Create the parser factory using the cpiCreateParserFactory function:
      factoryObject = cpiCreateParserFactory(&rc, constParserFactory);
      
  2. Define the classes of message supported by the factory using the cpiDefineParserClass function:
    if (factoryObject) {
       cpiDefineParserClass(&rc, factoryObject, constPXML, &vftable);
      }
    else {
        /* Error: Unable to create parser factory */
      }
  3. Return the address of this factory object to the broker:
      return(factoryObject);
    }

Creating an instance of the parser

Whenever an instance of a user-defined parser object is created, the context creation implementation function cpiCreateContext is invoked by the message broker. This allows the user-defined parser to allocate instance data associated with the parser.

For example:
  1. Call cpiCreateContext:
    CciContext* cpiCreateContext(
      CciParser* parser
    ){
      PARSER_CONTEXT_ST *p;
    
  2. Allocate a pointer to the local context:
      p = (PARSER_CONTEXT_ST *)malloc(sizeof(PARSER_CONTEXT_ST));
    
  3. Clear the context area:
      if (p) {
         memset(p, 0, sizeof(PARSER_CONTEXT_ST));
        }
      else {
        /* Error: Unable to allocate memory */
      }
  4. Return the pointer to the local context:
      return((CciContext*)p);
    }

Implementing the parser functionality

A parser needs to implement the following types of implementation function:
  1. input functions
  2. parse functions
  3. output functions

Each type of function is described below.

Deleting an instance of the user-defined parser

To delete an instance of a parser, you use the cpiDeleteContext function. For example:
void cpiDeleteContext(
  CciParser*  parser,
  CciContext* context
){
  PARSER_CONTEXT_ST* pc = (PARSER_CONTEXT_ST *)context ;
  int                rc = 0;

  return;
}
Related reference
C user-defined parser API