xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Boag/CAM/Lotus" <Scott_Boag/CAM/Lo...@lotus.com>
Subject Issues with serializer classes
Date Sat, 27 Nov 1999 02:33:20 GMT
Assaf, I have integrated the serializer classes into Xalan, but have run
into some issues...

In BaseSerializer:
> protected void characters( String text, boolean cdata, boolean unescaped
> {
>    ElementState state;
>    state = content();
>    cdata = state.cdata;
>    // Check if text should be print as CDATA section or unescaped
>    // based on elements listed in the output format (the element
>    // state) or whether we are inside a CDATA section or entity.
>    if ( state != null ) {
>        cdata = cdata || state.cdata;
>        unescaped = unescaped || state.unescaped;
>    }

I get a null pointer exception with cdata = state.cdata;.  Since state can
clearly be null, I guess cdata has to be defaulted to something else.

In BaseSerializer:
>     public void characters( char[] chars, int start, int length )
>     {
>    characters( new String( chars, start, length ), false, false );
>     }

I'm concerned about creating a new string for every output event.  The
whole point of SAX using a char[] array is so that you don't have to take
the overhead of String.  Since each of the chars have to be walked
individually to escape them, I think you would be much better off using the
array directly.  It seems like a small thing, but Xalan needs to be very
concerned about performance, and every little bit helps (or hurts).

In BaseSerializer:
>     protected void characters( String text, boolean cdata, boolean
unescaped )

Besides the above issue with String, I need to be able to call this
function from the processor to handle the disable-output-escaping
attribute, but can not, because it is protected.  Unless you have another
secret to outputting non-escaped text.  (In any case, this is one case
where we'll have to bypass the SAX API).

In OutputFormat:
> setCDataElements( String[] cdataElements )

The cdata elements are qnames... they need to have resolved namespaces to
check against the result tree.  You may be suggesting that the namespaces
should be resolved to prefixes, but I don't think this will work, since
prefixes can mean different things at different points in the result tree.

This starts to get a little ugly design-wise, because it opens the
Pandora's box... should some basic classes like QName go into
org.apache.xml.basetypes or the like??  I think at least for now, you
should just copy either Xalan's or Xerces' QName class to the serializer

I don't see where you are mapping from ISO to Java encoding names.  Am I
missing something?  You can get tables for doing this out of either Xalan
or Xerces.

I don't see where you are mapping characters to HTML entity references like
&nbsp;.  This is critical.

I don't see where you are handling HTML empties.  This is also critical.

I don't see where you are handling % escaping in URL attributes for HTML.


This is just the initial analysis.  It's a little hard to do detailed
output comparison with both old Xalan output and XT, without having the
above implemented.  I suspect whitespace handling in HTML will be a
particularly big deal.

For now, I will probably retrofit the FormatterListener classes to work
with your framework, and slip them back in, since we need to do a stable
release by the end of next week.  I'll keep a switch so they can still use
the BaseSerializer derivatives.

I'm happy to lend a hand (after next week) with coding to address some of
the above issues, if it would be helpful.

Overall, I'm very pleased with how the Serializer classes went into
Xalan... having these classes will fix several rough spots in the processor
code.   Good stuff.


View raw message