lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki>
Subject Re: Performance question
Date Thu, 08 Jan 2004 12:23:52 GMT
Dror Matalon wrote:

>On Wed, Jan 07, 2004 at 07:24:22PM -0700, Scott Smith wrote:
>>After two rather frustrating days, I find I need to apologize to Lucene.  My
>>last run of 225 messages averaged around 25 milliseconds per message--that's
>>parsing the xml, creating the Document, and putting it in the index (2.5Ghz
>>cpu, 1G ram).  Turns out the performance problem was xerces sax "helping me"
>>by loading the DTD before it parsed each message and the DTD wasn't local to
>>our site.  After seeing Terry's response, I knew there had to be more going
>>on than what I was assuming.
>>Thanks for the suggestions.  I wonder how much faster I can go if I
>>implement some of those?
>25 msecs to insert a document is on the high side, but it depends of
>course on the size of your document. You're probably spending 90% of
>your time in the XML parsing. I believe that there are other parsers
>that are faster than xerces, you might want to look at these. You might
>want to look at
You may want to check the XML Pull Parser - it offers something between 
SAX and DOM, with performance similar to SAX. 

Best regards,
Andrzej Bialecki

Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
FreeBSD developer (

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message