lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Wellnhofer <wellnho...@aevum.de>
Subject Re: [lucy-user] Is there any benchmarking details about how fast is lucy indexing
Date Thu, 04 Dec 2014 00:05:14 GMT
On 03/12/2014 16:15, Shahab Mohammed wrote:
> I will like to know what is rate of indexing .. ?? MB/sec that can be
> indexed. If some one has done such benchmarking please share the info with
> me.

This depends on a lot of factors like the schema and analysis chain you use, 
the total size of your index, and the hardware. But if you want a ballpark 
figure, I'd say about 1-2 MB/s.

Here is some data for one of our production systems running on a typical VPS:

Total fields: 3
Full text field: 2
Highlightable fields: 2
Documents: 20,000
Raw input size: 35 MB
Index size: 80 MB
Analysis chain:
   StandardTokenizer
   Normalizer
   SnowballStopFilter
   SnowballStemmer
Total time to reindex: 30s

This includes the time to pull all of the data out of a PostgreSQL database, 
prepare it for indexing, and some other unrelated operations which shouldn't 
have a large impact.

Nick


Mime
View raw message