lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Performance question
Date Thu, 08 Jan 2004 12:23:52 GMT
Dror Matalon wrote:

>On Wed, Jan 07, 2004 at 07:24:22PM -0700, Scott Smith wrote:
>  
>
>>After two rather frustrating days, I find I need to apologize to Lucene.  My
>>last run of 225 messages averaged around 25 milliseconds per message--that's
>>parsing the xml, creating the Document, and putting it in the index (2.5Ghz
>>cpu, 1G ram).  Turns out the performance problem was xerces sax "helping me"
>>by loading the DTD before it parsed each message and the DTD wasn't local to
>>our site.  After seeing Terry's response, I knew there had to be more going
>>on than what I was assuming.
>>
>>Thanks for the suggestions.  I wonder how much faster I can go if I
>>implement some of those?
>>    
>>
>
>25 msecs to insert a document is on the high side, but it depends of
>course on the size of your document. You're probably spending 90% of
>your time in the XML parsing. I believe that there are other parsers
>that are faster than xerces, you might want to look at these. You might
>want to look at http://dom4j.org/.
>
>Dror
>
>  
>
You may want to check the XML Pull Parser - it offers something between 
SAX and DOM, with performance similar to SAX. 
(http://www.extreme.indiana.edu/xgws/xsoap/xpp)

-- 
Best regards,
Andrzej Bialecki

-------------------------------------------------
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-------------------------------------------------
FreeBSD developer (http://www.freebsd.org)



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message