assuming that interpreted Java byte code is still slower than compiled C
code, it is desirable to use James' expat XML Parser in conjunction with
SAX to gain maximum performance while parsing XML documents. Therefore I
suggest to use the Java native method interface (JNI) to invoke expat.
How can this be achieved? First of all, we need a shared library (on
Windows a DLL) which contains the expat code and in addition to that
these handlers:
---------------------- start of C code ----------------------------
static void characterData(void *userData, const char *s, int len) {
  callback into Java VM here for
    expatHandler.doCharacterData
}
static void startElement(void *userData, 
  const char *name, 
  const char **atts)
{
  callback into Java VM here for
    expatHandler.doStartElement
}
static void endElement(void *userData, const char *name)
{
  callback into Java VM here for
    expatHandler.doEndElement
}
static void processingInstruction(void *userData, 
  const char *target, 
  const char * data)
{
  callback into Java VM here for
    expatHandler.doProcessingInstruction
}
void initParser() {
  XML_Parser parser = XML_ParserCreate(encoding);
  XML_SetElementHandler(parser, startElement, endElement);
  XML_SetCharacterDataHandler(parser, characterData);
  XML_SetProcessingInstructionHandler(parser, processingInstruction);
}
void doParse() {
 XML_Parse(parser, data, size, 1));
}
---------------------- end of c code -----------------------------
We now can define the Java class which interfaces to the expat parser
contained in a shared libray. This java code look like this:
----------------- start of expatHandler --------------------------
public class expatHandler {
  void doCharacterData(String s, int len) {
    documentHandler.characters(s);
  }
  void doStartElement(String name, String[] atts) {
    documentHandler.startElement(s,convertAttsToAttributeMap(atts));
  }
  void doEndElement(String name) {
    documentHandler.endElement(name);
  }
  void doProcessingInstruction(String target, String data) {
    documentHandler.processingInstruction (target,data);
  }
  native public void doParse();
  native public void initParser();
  static {
    loadLibrary("expat");
  }
}
----------------- end of expatHandler -------------------------
It is then straightforward to extend this expatHandler to declare the
SAX driver for expat:
----------------- start of expat driver ------------------------
package com.microstar.sax;
/**
  * A SAX driver for James Clark's expat XML parser 
  */
public class expatDriver extends expatHandler
  implements org.xml.sax.Parser {
  public void setEntityHandler (EntityHandler handler); {
    this.entityHandler = handler;
  }
  public void setDocumentHandler (DocumentHandler handler) {
    this.documentHandler = handler;
  }
  public void setErrorHandler (ErrorHandler handler) {
    this.errorHandler = handler;
  }
  public void parse (String publicId, String systemId)
    throws java.lang.Exception
  {
    ...
    initParser();
    documentHandler.startDocument();
    doParse();
    ...
    documentHandler.endDocument();
    ...
  }
}
----------------- end of expat driver -----------------------
Does this make sense? Comments?
Who will volunteer? ;-)
Cheers,
Joerg