lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: an alternative to optimize?
Date Fri, 01 Dec 2006 15:03:56 GMT
I haven't tried it, but according to http://lucene.apache.org/java/ 
docs/fileformats.html, each segment is a complete sub index.  I  
_wonder_ if you couldn't manage your own merges by using  
IndexWriter.addIndexes() where you load each segment in separately  
(this may mean copying the segments to other directories, but I am  
not sure).  Another option would be to modify Lucene to expose the  
merge functionality.

This is pure speculation at this point, but I know the capabilities  
exist (as all optimize does is merge segments until there is one  
segment) so it seems like it should be possible.

-Grant

On Dec 1, 2006, at 8:11 AM, Stanislav Jordanov wrote:

> Guys,
>
> I've already asked this question but nobody answered:
>
> Suppose we have a relatively big index which is continuously  
> updated - i.e. new docs get added while some of the old docs get  
> deleted.
> For pragmatic reasons we have a restriction on maxMergeDocs so that  
> segment files don't get enormously big.
> Consider now a segment of max size (i.e. containing maxMergeDocs  
> docs hence not eligible for a merge)
> It is possible that (as time passes) this segment will have more  
> and more of its docs deleted.
> But as it is not merge-able it will remain the same size and with  
> lots of "wholes" in it which is bad for performance.
> The only way that I am aware of to correct this problem is to  
> invoke index optimization, which has several drawbacks:
> 1. it takes a while to optimize a big index.
> 2. the optimization process always produces a index comprising of a  
> single (extremely) large segment.
> We can live with 1.
> But 2 is undesirable.
> Is there a way to "optimize" (in terms of purging its deleted docs)  
> an index or a single segment
> without ending up with a single segment index?
>
> Best,
> Stanislav
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ 
LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message