xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott_B...@lotus.com
Subject Re: Xerces2 requirements
Date Tue, 19 Sep 2000 21:25:44 GMT

Well, something to think about.  We were hoping that doing something with
just text nodes might make this feasible.  You might be right, maybe only a
custom processor will do for this sort of thing.

-scott




                                                                                         
                    
                    Andy Clark                                                           
                    
                    <andyc@apache        To:     general@xml.apache.org               
                       
                    .org>                cc:     (bcc: Scott Boag/CAM/Lotus)          
                       
                                         Subject:     Re: Xerces2 requirements           
                    
                    09/19/00                                                             
                    
                    03:29 PM                                                             
                    
                    Please                                                               
                    
                    respond to                                                           
                    
                    general                                                              
                    
                                                                                         
                    
                                                                                         
                    



Scott_Boag@lotus.com wrote:
> Perhaps to expand on this... there should be some way to get to
> the raw, unencoded character buffer for text nodes, and to have a
> way for the parser to not encode the text if a switch is thrown.

For the rest of the people involved in this discussion, I want
to point out that what Scott is talking about is really the
underlying bytes of the input stream. Typically, the use of
the word "character" implies that the byte(s) have been
transcoded into the Unicode character already.

> The reason is for high performance transformation when the input
> encoding is the same as the output encoding, and the text doesn't
> have to be explored by either the parser or the transformer.

I can understand the performance benefit but it's really not
going to be possible to do this. It adds an amazing amount of
complexity to the parser and tree model implementation. I feel
that it would be at an unacceptable level for source that needs
to be maintained and extended in the future.

> Sorry, I know this sounds hard, but we need a way of super-
> charging certain types of (e-business) transformations.

Then those people will need a custom parser to support their
needs and get the performance they require. But I think that
it would cripple the Xerces parser and we'd be back where we
started. The current code is closer to being able to support
this kind of feature because it defers transcoding and keeps
the underlying byte buffers around until needed. And the
state of the current code is why we're working on the
requirements for the next version.

--
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org






Mime
View raw message