lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Optimizing index takes too long
Date Sun, 11 Nov 2007 23:41:12 GMT
Hmmm, something doesn't sound quite right.  You have 10 million docs,  
split into 5 or so indexes, right?  And each sub index is 150  
gigabytes?  How big are your documents?

Can you provide more info about what your Directory and IndexWriter  
settings are?  What version of Lucene are you using?  What are your  
Field settings?  Are you storing info?  What about Term Vectors?

Can you explain more about your documents, etc?  10 million doesn't  
sound like it would need to be split up that much, if at all,  
depending on your hardware.

The wiki has some excellent resources on improving both indexing and  
search speed.

-Grant


On Nov 11, 2007, at 6:16 PM, Barry Forrest wrote:

> Hi,
>
> Optimizing my index of 1.5 million documents takes days and days.
>
> I have a collection of 10 million documents that I am trying to index
> with Lucene.  I've divided the collection into chunks of about 1.5 - 2
> million documents each.  Indexing 1.5 documents is fast enough (about
> 12 hours), but this results in an index directory containing about
> 35000 files.  Optimizing this index takes several days, which is a bit
> too long for my purposes.  Each sub-index is about 150G.
>
> What can I do to make this process faster?
>
> Thanks for your help,
> Barry
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message