lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wettin <karl.wet...@gmail.com>
Subject Re:InstantiatedIndex questions
Date Tue, 06 Oct 2009 17:51:44 GMT

6 okt 2009 kl. 18.54 skrev David Causse:

David, your timing couldn't be better. Just the other day I proposed  
that we deprecate InstantiatedIndexWriter. The sum of the reasons to  
this is that I'm a bit lazy. Your mail makes me reconsider.

https://issues.apache.org/jira/browse/LUCENE-1948

> On the index time InstantiatedIndex is behind RAMDirectory, but the  
> time

Would you mind benchmarking some for me using your corpora? The issue  
suggests that people use the InstantiatedIndex(IndexReader)  
constructor to create the index rather than using  
InstantiatedIndexWriter. Is it way slower for you to produce the index  
using RAMDirectory/IndexWriter and pass an IndexReader to  
InstantiatedIndex?


This is what the package level javadocs says about  
InstantiatedIndexWriter:

"Hardly any effort has been put in to optimizing the  
InstantiatedIndexWriter, only minimizing the amount of time needed to  
write-lock the index has been considered."

I'm sure there are ways to speed it up, I just never managed to find  
the time to look in to it. I never really used IIW.


It might be worth mentioning that when InstantiatedIndex#commit  
returns it has yeilded an optimized "single segment" index. This is  
not quite how a Directory/IndexWriter acts.

> gained over queries make it better (for what I see it can be 2 times
> faster).
>
> InstantiatedIndex will be our default volatile mini index store for  
> our
> next production release.

Very cool!!

> Whe should have other needs of this index but the lack of addIndexes
> support make it impossible for us to use it in other situations. So we
> continue to use RAMDirectory in such situations.

Have you considered using multiple InstantiatedIndex and a  
MultiReader? That would pretty much be the same thing, just that the  
store wouldn't be quite as optimized. It would definitly use more RAM  
than if it was the same index. You could of course also pass this  
MultiReader to a new InstantiatedIndex. I have no real clue about the  
difference in speed and RAM consuption between these solutions so you  
should benchmark all solutions.

> Do you think we could reach RAMDirectory index time by tweaking some  
> initialCap
> stuff inside java.util.Collections you use?


Maybe. But I think it would be a relatively small gain. But don't take  
my words for granted, benchmark it.

Using the InstantiatedIndex(IndexReader) constructor will create  
rather optimal size of the collections.

As for InstantiatedIndexWriter I think it's pretty much only the  
transient collections in #commit that will help you, my guess is that  
you should expemient with the dirtyTerms and termsByText attributes.  
Count the number of terms in your complete index and see how much it  
speeds thing up by creating the collections with this size from the  
start.




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message