lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Goller <gol...@detego-software.de>
Subject Performance of TermVectors and skipTo
Date Fri, 02 Jul 2004 12:10:44 GMT
Hi folks,

I have done some performance tests for TermVectors and the new
TermDocs.skipTo() implementation, both introduced with 1.4.
I am very pleased with the results. I did these tests with the
Reuters news corpus (roughly 800000 documents).

*) I compared TermVectors with the solution of storing the
respective fields and re-analyzing the documents in order to
get their terms. According to my measurements, TermVectors speed
up accesss to the terms by a factor of 7!

*) For testing skipTo, I used my implementation for getting highly
correlated terms. For computing the correlation measure I have to
compare a lot of TermDocs lists with each other or other lists of
document ids. According to my measurements on an optimized index
skipTo speeds up my term correlation implementation by a factor of
2. And the benefit of skipTo probably increases with index size.

regards,
Christoph




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message