xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Assaf Arkin <ar...@exoffice.com>
Subject Re: Issues with serializer classes
Date Sat, 27 Nov 1999 05:51:38 GMT
Scott Boag/CAM/Lotus wrote:
> I get a null pointer exception with cdata = state.cdata;.  Since state can
> clearly be null, I guess cdata has to be defaulted to something else.

Cut & paste bug, cdata = state.cdata should clearly not appear there. In
fact, I get it from the argument list then OR it with state.cdata after
the null check.


> I'm concerned about creating a new string for every output event.  The

It's not a requirement, it's just me keeping the implementation simple
until it gets stabalized. Only after the code becomes stable do you want
to go there and make it faster, and having two character methods (one
for SAX, one for DOM) is first on the list.


> =============
> In BaseSerializer:
> >     protected void characters( String text, boolean cdata, boolean
> unescaped )
> 
> Besides the above issue with String, I need to be able to call this
> function from the processor to handle the disable-output-escaping
> attribute, but can not, because it is protected.  Unless you have another
> secret to outputting non-escaped text.  (In any case, this is one case
> where we'll have to bypass the SAX API).

If you have a list of elements the contents of which should not be
escaped, I suggest you try using OutputFormat.setUnescapedElement first.

If that's not good enough, the function can be made public but you are
not only bypassing SAX, you are also making a dependency. Is there any
way around it?



> =============
> In OutputFormat:
> > setCDataElements( String[] cdataElements )
> 
> The cdata elements are qnames... they need to have resolved namespaces to
> check against the result tree.  You may be suggesting that the namespaces

Before we get into where to put qnames, what should I expect to get as
tag/attribute names in SAX/DOM?


> 
> =============
> I don't see where you are mapping from ISO to Java encoding names.  Am I
> missing something?  You can get tables for doing this out of either Xalan
> or Xerces.

Given the ISO name in OutputFormat.getEncoding I print it directly, or I
get a suitable Writer (when passed an OutputStream). But I never get to
ask for the Java encoding name directly. (I would, if I could figure the
encoding from a Writer - anyone?)

 
> =============
> I don't see where you are mapping characters to HTML entity references like
> &nbsp;.  This is critical.

HTMLPrinter.getEntityRef which takes it from HTMLdtd which loads all of
them from HTMLEntities.res. The same code is also used in the HTML
parser.


> =============
> I don't see where you are handling HTML empties.  This is also critical.

Once again HTMLdtd has lots of HTML-specific information, including
empty elements, whitespace preserving elements, and elements that prefer
not to have a closing tag.


> =============
> I don't see where you are handling % escaping in URL attributes for HTML.

I'm not. It won't be interesting if I did everything ;-)


> 
> ***
> 
> This is just the initial analysis.  It's a little hard to do detailed
> output comparison with both old Xalan output and XT, without having the
> above implemented.  I suspect whitespace handling in HTML will be a
> particularly big deal.

Everything is not whitespace preserving, except PRE and TEXTAREA (SCRIPT
and STYLE are special cases). I do not strips spaces on output because I
do not expect them to be there to begin with.


> 
> For now, I will probably retrofit the FormatterListener classes to work
> with your framework, and slip them back in, since we need to do a stable
> release by the end of next week.  I'll keep a switch so they can still use
> the BaseSerializer derivatives.
> 
> I'm happy to lend a hand (after next week) with coding to address some of
> the above issues, if it would be helpful.
> 
> Overall, I'm very pleased with how the Serializer classes went into
> Xalan... having these classes will fix several rough spots in the processor
> code.   Good stuff.

Thanks.

arkin

> 
> -scott

Mime
View raw message