> <david@megginson.com> writes:
 > 
 > > So, Henry's asking whether this is valid:
 > > 
 > >   <!DOCTYPE a [
 > >     <!ELEMENT a (b, c)>
 > >     <!ELEMENT b EMPTY>
 > >     <!ELEMENT c EMPTY>
 > >   ]>
 > >   <a><![CDATA[  ]><b/><c/></a>
And I'll answer my original posting and say that it's not valid
because it's not well-formed -- let's try
  <!DOCTYPE a [
    <!ELEMENT a (b, c)>
    <!ELEMENT b EMPTY>
    <!ELEMENT c EMPTY>
  ]>
  <a><![CDATA[  ]]><b/><c/></a>
instead, and continue the discussion from there.
 > What he said.  The DOM made a serious mistake here in my opinion:
 > it's stranded in no-person's-land between raw and cooked, without
 > being either.  It's not cooked, because it gives you
 > EntityReference and CDATA nodes.  It's not raw, because it DOESN'T
 > give you character entity references.
The DOM level-one core serves two constituencies -- authoring tools
that need to do horizontal transformations (XML=>XML, where the result
replaces the original) and processing/rendering tools that need to do
downstream processing (XML=>XML or XML=>X, where the original remains
unaltered).  Horizontal transformations will usually be somewhat
lossy, and the DOM WG has clearly decided that only a few lexical
features were important enough to give a good cost/benefit return on
the effort required to specify and implement them.
However, the point is that a specific DOM tree doesn't *have* to
include nodes for comments, CDATA sections, and entity references --
they are there only to support very specialised applications and
should be stripped out for ordinary XML processing.
All the best,
David
-- David Megginson david@megginson.com http://www.megginson.com/