lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Typical Indexing performance
Date Tue, 03 Jun 2008 19:42:09 GMT
There i really no "typical".  I'm playing with Hadoop (HDFS) and Solr at the moment, for example,
and I'm seeing indexing rate of cca 70 docs/second.  However, the bottleneck there is not
indexing, it is reading data from HDFS (over the network).


I've also seen 500+ docs/second.

It depends on many factors:
how fast reading your data source is, how complex your analysis is, the size of documents
and number of fields, whether fields are stored or only indexed, the IndexWriter settings
for segment merging and memory usage, of course, there is hardware, etc.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


----- Original Message ----
> From: Simon Wistow <simon@thegestalt.org>
> To: Lucene <java-user@lucene.apache.org>
> Sent: Monday, June 2, 2008 7:40:52 PM
> Subject: Typical Indexing performance
> 
> I know this is one of those "How long is a piece of string?" questions 
> but I'm curious as to the order of magnitude of indexing performance.
> 
> http://lucene.apache.org/java/docs/benchmarks.html
> 
> seems to indicate about 100-120 docs/s is pretty good for average sized 
> documents (say, an email or something) or is that ludicrously out of 
> date for 2.3.x ?
> 
> Simon
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message