xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Libbrecht <p...@activemath.org>
Subject Re: Progressive parsing
Date Sat, 31 Aug 2002 13:03:47 GMT

Having a position according to an encoding is honestly, simply... bad.

One of the goal applications was to be able to be a client of such an 
indexed-database over http/1.1. The latter protocol has a way to request 
only a row of segments of a file. But that can only happen in bytes of 

When doing it with files, one expects to use, say, the 
InputStream.skip() method which is, hopefully, efficiently implemented 
and skips the cursor in the file-reading underlying routines.
Skipping x characters using an encoding is simply a killer: the encoding 
has to run through all the characters. For example, in UTF-8, skipping 
an escaped character means skipping three bytes (I think) whereas 
skipping an ASCII character means skipping one byte.

So... I really meant: "Can I get the byte-position".
Currently, the only way is to build thing index using a 
"load-in-memory-than-rewrite-to-file"... I can live with this but I 
would have expected "fine parsers" to provide more.


On Mardi, août 27, 2002, at 04:42 , Aleksander Slominski wrote:
>> Finally... to xerces makers/users: how do I get the byte position of an
>> element declaration I've just been handed to by the sax parser ?
> this is more complex as parser works on UTF-16 characters (char)
> so obtaining position of original stream if it was not UTF-16 is very 
> difficult. however i think that for your cases it is enough to get 
> position of start/end element in character stream. ability to obtain 
> position is not currently part of xerces2 but you can take a look on my 
> patch that adds to XMLLocator function getCurrentEntityAbsoluteOffset() 
> that can be used to get current position of parser. together with 
> changes to XMLDocumentFragmentScannerImpl it is possible to get 
> start/end position of every XML event in XNI. for details see:
> http://www.extreme.indiana.edu/xgws/xsoap/xpp/download/PullParser2/lib/xerces2_patched/

In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org

View raw message