XML for RPG and Procedural Languages Documentation

Readme
Installation
API Docs
Samples
Programming
License

Programming Guide

The programming concepts discussed in this programming guide are used in the sample code files provided with each programming language.

Initializing the XML Parser environment
    The following steps must be taken to prepare the XML Parser environment.
    1. Include the header file: To access the XML4PR interfaces the correct header file must be referenced in your source code. Make sure that the library that contains the header file is on the user library list or that the location of the header is fully qualified.
    2. Initialize the environment: Before any other API call can be made the XML parser environment must be initalized. An environment is initialized by calling the QxmlInit procedure. A DOM exception return area is also passed on the QxmlInit request providing a return area for any API usage exceptions. See variable (QxmlXML_env_t type) on the QxmlInit procedure.
Terminating the Environment and Object Cleanup
    The XML environment must be terminated prior to the completion of the program. This is accomplished by calling the QxmlTerm procedure or the equivalent QxmlTerm_rtnHandleCount procedure. The latter returns the number of handles still allocated at termination time. An excessive number of handles allocated may be an indication that your application is not doing appropriate object handle cleanup.

    Any objects that were used in the process of interfacing with the parser must also be deleted before the program ends. To delete an object you simply pass the object to the delete routine of the corresponding object type. This call will be of the form: Qxml<Object Type>_delete. For instance, if you had a DOMParser object you would call the API procedure QxmlDOMParser_delete.

    Due to the nature of the XML4PR interfacing with the underlying parser constructs of the XML4C parser, most objects used by the XML4PR parser are allocated from heap. Within an XML parser initialized session, between the QxmlInit and the QxmlTerm routines, the parser will recycle handles to objects that are returned to the XML4PR interface through the above mentioned delete requests. It is especially important to return handles that are no longer used to the parser for this purpose. This will help to keep down the number of objects allocated on the heap and provide for a longer running application environment. In addition, the XML4PR does use teraspace for heap allocations to allow for longer running applications.

Handling API Exceptions
    There are two types of errors that can be encountered when using the APIs. The first kind will surface as API Usage Exceptions when the API is used incorrectly or bad input is provided. The second kind are Parse Exceptions which will occur when parsing an XML file which is not well-formed or does not validate to a specified DTD or schema.
    1. API Usage Exceptions: Invalid use of the APIs can cause exceptions to be thrown by the underlying XML4C parser. The DOM exceptions thrown by the C++ DOM APIs are caught by the interface wrapper and corresponding error codes are returned in the return exception area. Before the XML parser can be used the environment must be initialized by a call to the QxmlInit procedure. QxmlInit is passed a QxmlXML_env_t exception return area. QxmlXML_env_t has a member named rtncode that is set to one of the possible return values. This value should be checked while debugging your application to verify that the program is operating in a correct state. The values and meanings of the return codes are documented at the QxmlXML_env_t documentation.
    2. Parse Exceptions:

      DOM Parse Exceptions: When the DOMParse_new procedure is called on a document a QxmlParse_env_t return area is provided to return the parser error messages when the parse request fails. After each parse the value of the member errortype should be checked to verify that the parse was successful before your application continues to issue DOM APIs to navigate the DOM tree. If an error is set there will be more information provided in the other members of QxmlParse_env_t, such as line and column number. Note that parser exceptions do not go to the job log, except when the XML File cannot be opened or given the specified encoding a transcoder cannot be created..

      SAX Parse Errors: The SAX parser allows an event-driven callback mechanism for handling parse error conditions. An error handler can be registered with the SAX parser by calling QxmlSAXParser_setErrorHandler. Your error handler can distinguish between parser warning, error and fatal error messages. See SAXCount example for how to set up handlers.
XML File encoding attribute
    The XML parser must know the encoding of the XML file or memory buffer string that it is parsing. The encoding is specified by the encoding attribute in the XML header. If encoding is not specified the parser assumes the document is in an ASCII encoding (like 819). If a file's encoding cannot be determined as the default, you must specify the encoding with the encoding attribute. For instance, if the file is in the US-English EBCDIC codepage the XML header would be:
      <?xml version="1.0" encoding="IBM037"?>

    If the parser returns an error that it finds an unrecognized character on or near line #1 of the file then the encoding may not be set to the correct value. Other possible warning messages may be:
    1. "The URL was not correctly formed"
    2. "The primary document entity could not be opened"
    3. "Expected a markup declaration"
    Note that the codepage attribute of the IFS file, the encoding attribute specified in the XML document prologue (encoding="xxx"), and the actual bytes of the XML file must all correspond. Use the following method to determine the actual codepage of the file:
    1. Execute the wrklnk command on the file:
      wrklnk 'name-of-file'
    2. Display the attributes of the file by selecting option 8
    3. Look for the row labelled "Code Page"
    If the value is 819 then the file encoding is ISO-8859-1 or ASCII. If the value is 37 then the encoding is IBM037. The IFS file attribute, the file content and the encoding attribute must all reflect the true content of the file.
    See the encoding string table for a list of the recognized encoding strings and corresponding code pages. Note that this list may not be complete as added function can results in additions.
String Manipulation and Code Page Support
    Internally the XML parser works with unicode encoded character strings. This is true for both the internal representation of the XML content and the input and output via the APIs. When using the XML4C APIs, this can be in the form of XMLCh arrays (a unicode string) or DOMStrings. As C, RPG and COBOL programs on the iSeries typically work with characters and strings encoded in EBCDIC and of the form char or char*, additional parameters are provided on the XML4PR APIs to allow your application to specify the type of encoding for char type values for both input and output.

    Using QxmlTranscode to convert strings
    The XML Parser Interface Wrappers provide a procedure named QxmlTranscode that can be used to convert strings to and from various encodings. The instringind and outstringind parameters take the coded character set ID numeric of the source and target code pages respectively. There are two special cases:
    1. Qxml_INDOMSTR or -1 to specify that the string is a DOMString type.
    2. Qxml_JOBCCSID or 0 to specify that the code page is the same as the JOBCCSID.

    Two more constants have been defined. Qxml_CCSID37 corresponds to the IBM037 code page (an EBCDIC code page) and is set to 37. Qxml_UNICODE is used when the code page for the string is UNICODE. Even though these two code pages are explicitly defined as constants, any code page supported by ICONV can be used with QxmlTranscode by using the corresponding code page numeric value.

    To perform the code page transformation using QxmlTranscode the input string must be null terminated and the procedure must know how much space is allocated for the output string (the ouputstring parameter). It is important to note that if you are transforming from a single-byte character set to a double-byte character set you must provide 2 times the space required for the input string. The bytesprovided parameter specifies how many bytes are allocated for the output string. The bytesavailable parameter is set inside the procedure to how many bytes are used in the transformed string.

The DOMString
    The DOMString construct provides the mechanism for passing string data to and from the DOM API.
    • It stores UNICODE text
    • Automatic memory management, using reference counting.
    • DOMStrings are mutable - characters can be inserted, deleted or appended.

    Although DOMString is used quite frequently throughout the C++ APIs, the XML4PR interface offers flexibility on what is acceptable for the type of string being passed to many of the API procedures. Whether you pass in a DOMString or a character string it is important to set the string indicator (typically documented as stringind) parameter to reflect whether the value is a DOMString or a character string. If a DOMString is passed as a parameter then the string indicator must be set to Qxml_INDOMSTR or -1. If a character string is passed then the string indicator is set to the coded character set ID numeric of the string. Many of the samples are using Qxml_CCSID37, but you can provide the correct string indicator for your string variable. For example of the string content is in the French code page, you would provide value stringind value as 297 for EBCDIC France.

    Also note that a DOMString is not simply a character array and therefore must be translated into a character string by using the QxmlTranscode API procedure before the data in the DOMString can be used in non-XML API calls (such as C standard library calls). Another way of returning the data in char*/string form is to use QxmlDOMString_transcode, but this will return the string data in the default code page of the XML4C parser, which on iSeries is EBCDIC 37.


XMLCh Characters and Strings
    An XMLCh is a two byte data type used to store a single Unicode character. An XMLCh string is an array of Unicode characters terminated with a Unicode null terminator (two null bytes). If you want to manipulate the data in one of these strings you should transcode it into a character string with QxmlTranscode.

    RPG Note: RPG's Internal Data Type "C" is a Unicode character and data initialized as this type can be passed to the XML4PR APIs in place of XMLCh characters and strings where appropriate.
XMLFormatter Usage Information
    The XMLFormatter was new in version 3.5.1.

    What the XMLFormatter does
    The XMLFormatter provides a way to output XML data in the desired encoding and provides an interface to automatically generate the appropriate escape characters necessary to create a well-formed XML document. Providing the necessary escape characters for XML document content (within elements and attributes) that contain such markup characters as <, >, ", &, and ' is key when streaming out XML documents. A number of flags are provided to control whether various optional formatting operations are performed. The basic operation is Unicode encoded data (in Unicode strings, Unicode characters or DOMStrings) is passed into the formatter and it is transcoded to the encoding specified and directed to the output. By setting various flags on the XMLFormatter the data that is printed out can be affected. XMLFormatter escape flags can be set with QxmlXMLFormatter_setEscapeFlags. If the Escape flags for a particular group are set then any characters written through the XMLFormatter that match a character in the group will be escaped (printed in the form &Character;). Here are the XMLFormatter escape flag options:
    • Qxml_NoEscapes: None.
    • Qxml_StdEscapes: & > " < ' (Ampersand, Close Angle, Double Quote, Open Angle and Single Quote.)
    • Qxml_AttrEscapes: & < " (Ampersand, Open Angle and Double Quote)
    • Qxml_CharEscapes: & < (Ampersand and Open Angle)


    The XMLFormatter can also affect how characters which are unrepresentable in the target encoding are handled. Here are the XMLFormatter unrepresentable character flags options:
    • Qxml_UnRepFail: Fail if a character can not be represented in the target encoding. (Currently not supported)
    • Qxml_UnRep_CharRef: Print the reference for the character. This is the Unicode character code written as an escaped hexadecimal value.
    • Qxml_UnRep_Replace: Print the replacement character provided to the transcoder. (Currently not supported)


    How to use the XMLFormatter
    Create a new XMLFormatter by calling QxmlXMLFormatter_new. The following parameters are expected:
    • Output encoding - This is a string that designates what the output encoding of the XMLFormatter should be. "IBM037" and "UTF-8" are examples of possible values for this. See the XML encoding attribute section above. This value can be passed as either a character string or a DOMString.
    • String encoding indicator - Set this to indicate the encoding of the Output Encoding passed in as the first parameter. Qxml_INDOMSTR if it is a DOMString, otherwise the CCSID of the string.
    • String length - The length of the string, or 0 if it is a DOMString or NULL terminated.
    • XMLFormatTarget - This value specifies where the output of the XMLFormatter will go. Right now it can either be NULL, which will direct output to standard output (stdout), or an XMLFormatTarget type that was initialized with a call to QxmlFileFormatTarget_new. The FileFormatTarget is discussed below in the API provided output streams section.
    • Escape flags - Use one of the constants from the escape flags list above.
    • Unrepresentable character flags - Use one of the constants from the unrepresentable character flags list above.


    The following XMLFormatter methods are used to write text to the output:
    • QxmlXMLFormatter_streamoutXMLString - If you have a unicode string (an array of XMLCh characters) use this method to print the information. Typically this would be used in the SAX API callbacks.
    • QxmlXMLFormatter_streamoutDOMString - If you have a DOMString use this method. DOMStrings are the typical source of text data when using the DOM APIs.
    • QxmlXMLFormatter_streamoutXMLCh - If you have a single XMLCh you want to print use this method. These are defined in the XMLUNIDEFS header for the programming language you are using.

    To see an example of how the XMLFormatter is used look at the source code for the DOMPrint sample.
Output Streams
    Opening and Closing a Stream File
    To provide a consistent interface for writing XML data to a stream file the XML4PR Interface Wrappers provide procedures for operating on stream files. Three of the procedures are used to open and close a stream file:

QxmlOpenNewOutputStream and QxmlAppendOutputStream are used to return a file descriptor for an open stream file in the IFS. Both calls are passed the same parameters:
  • IFSFilename: The name of the IFS file to open
  • namesize: The length of the file name. If the char* containing the file name is null terminated then 0 can be used for the length.
  • error: A pointer that will contain an integer error code if there is an error during the call.
  • open_fd_codepage: The code page that the stream file should be encoded in.
If the call returns the value -1 then there was an error. The value of -1 will also be returned if QxmlOpenNewOutputStream is called to create a file that already exists. Check the value of error to get more specific information about what caused the error by running DSPMSGD CPE#### where #### corresponds to the value stored in error. Otherwise the value returned is the file descriptor that will be used to refer to the file.

When all of the data has been written to the stream file and it is no longer needed the FileDescriptor needs to be closed. Use the QxmlCloseOutputStream procedure to do that.

Writing to an IFS stream file
The preferred way to write to an open stream file is to use the FileFormatTarget and the XMLFormatter. Read the XMLFormatter Information above for information about the benefits it provides for printing XML data. To direct the output of the XMLFormatter to a stream file follow these steps:
  1. Open the stream with QxmlOpenNewOutputStream or QxmlOpenAppendOutputStream
  2. Create a new FileFormatTarget with the FileDescriptor for the open output stream. QxmlFileFormatTarget_new
  3. Create a new XMLFormatter with the FileFormatTarget that was just created. The FileFormatTarget should be passed as the value for XMLFormatTarget in the QxmlXMLFormatter_new procedure call.


The second way to write data to an open stream file is to use the QxmlWriteOutputStream procedure. The return value is an error return code, with 0 indicating success. The string indicator (stringind) parameter functions the same way it does on other API calls. If the string to write is a DOMString the string indicator is Qxml_INDOMSTR or -1 and to write in the JOBCCSID the string indicator is QxmlJOBCCSID or 0. Otherwise it is the value of the code page numeric. The code page of the output stream must also be specified in the open_fd_codepage parameter. This will be the same value that was used when the file was opened with the QxmlOpenNewOutputStream command. All strings will be automatically converted to the specified output code page by the procedure call. EBCDIC Linefeed Mappings can cause Invalid Characters
    There are two common line feed characters on iSeries, the x'25' and x'15'. These two characters are mapped to U'000A' and U'0085' respectively in UNICODE. Currently the U'0085' does not map to a whitespace character given the XML architecture, so an EBCDIC x'15' line feed characters in your XML document can convert to U'0085' and be picked up as an invalid character in your XML syntax (not as ignorable whitespace). This is problematic. Resolution currently is to scan for x'15' and convert to x'25', other solutions may be forthcoming in the future. If this is a major concern, report it so we can figure out the impact.

    Also note that if you are using QxmlWriteOutputStream in a C program and you wish to use \n as a carriage-return / line-feed character then you must create the C module with SYSIFCOPT(*IFSIO) set (that is from the System interface options when prompting on crtcmod). If this is not set the '\n' character translates to EBCDIC character 15(hex) and no line break will occur in the output.
Multi-Threading
    There has been limited interaction with the XML4PR parser in a threaded environment, so mulit-threaded use of XML4PR should be used with caution. That said, XML wrapper interface is multi-thread 'enabled'. To run multiple threads, doing different, and distinct parser requests, you must issue the QxmlInit api within each thread and provide a separate environment area for return code information. Under the covers we will keep track of the exception return area on a per thread basis and, if there is an API exception, the wrapper will record the return code in the appropriate return exception area. Similarly, each instance of DOMParser, should have its own parser exception area. Issuing a QxmlTerm removes your thread from the managed return code areas.
XML4PR - XML4C Interface Wrapper for RPG, C and COBOL
Copyright 2000,2001,2002 International Business Machines. All Rights Reserved.