lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Regarding multiple index creation and Searching
Date Mon, 15 Aug 2011 17:57:31 GMT
Hi,

This all depends on your index contents and hardware. In general the size of
a single index / index segment vs multiple segments / indexes is not an
issue on one single machine. To scale, you should also think of using more
than one machine with e.g. ElasticSearch or Apache Solr instead of plain
Lucene (which provide that functionality). For the single machine case, you
can only speed up the stuff by parallelization.

> 1. What is the average acceptable size for Lucene index that is considered
OK
> for searching? (before it is broken down into multiple indexes) 2. Other
than
> performance, what should be the criteria to decide on separating the index
into
> mutiple index. (Criteria like single file in the index should not be more
than
> 2GB, or the total lucene index folder size should not be above 10GB etc)

Depends. On one single machine it does not matter how big files are. When
searching, an index consisting of several sub-indexes / segments behaves
almost identical to one big optimized one. This is only different when you
parallelize.

> (Regarding code changes required to break the documents into appropriate
> year)
> I will be reindexing all the documents again using modified code base. For
that
> I will be required to
> 
> 3. Create multiple indexWriters and index the document using appropriate
> writer as per the date of the document.

That's fine. The question is if that makes sense. Will the results of search
queries be coming from all indexes equally distributed? If you want to
parallelize, its often better to have some hash-based distribution

> 4. While searching, use multiSearcher or ParallelMultiSearcher to search
> across all indexes at once.

MultiSearcher and ParallelMultiSearcher are deprecated and broken (and no
longer supported). The correct way tosearch different indexes is to wrap all
sub-Indexes by MultiReader and then use a single IndexSearcher on top of it.
To parallelize, pass an ExecutorService to its ctor. Please note:
IndexSearcher can only parallelize, if there are subindexes, so a big
optimized index does not help here :-) Ideally you would create several
separate indexes using a hash-based distribution of documents.

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message