lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "karl wettin" <karl.wet...@gmail.com>
Subject Re: InstantiatedIndex help + first impression
Date Wed, 19 Nov 2008 02:27:10 GMT
The actual performance depends on how much you load to the index. Can
you tell us how many documents and how large these documents are that
you have in your index?

Compared with RAMDirectory I'vee seen performance boosts of

up to 100x in a small index that contains (1-20) Wikipedia sized
documents, an index I used to apply user search agents on as new data
arrived to the primary index.
up to 25x when placing massive amounts of span queries on the apriori
index in LUCENE-626. This index contained tens of thousands of
documents with only a few (5-20) terms each.
up to 15x in a relatively large ngram index for classifications using
LUCENE-626. This is pure skipTo operations.

Regarding the fuzzy query, try to see how much time was spent
rewriting the query and then how much time was spend querying. I'm
almost certain you'll notice that the time spent rewriting the query
(comparing edit distance between the terms of the index and the query
term) is overwhelming  compared to the time spend searching for the
rewritten query. I.e. this is probably as much a store related expense
as it is a Levenshtein calculation expense.


    karl

(this is my second reply, the first one seems to be lost in space?)

On Mon, Nov 17, 2008 at 1:37 AM, Darren Govoni <darren@ontrenet.com> wrote:
> After I switched to InstantiatedIndex from RAMDirectory (but using the
> reader from my RAMDirectory to create the InstantiatedIndex), I see a
> less than 25% (.25) improvement in speed. Nowhere near the 100x (100.00)
> speed mentioned in the documentation. Probably I am doing something
> wrong.
>
> I am using too, a fuzzy query. e.g. "word:house~0.80" but I'd expect the
> improvement to be because of physical representation (memory graph) and
> mostly unaffected by the query. no?
>
> Could there be some lazy loading going on in RAMDirectory that prevents
> InstantiatedIndex from building out its graph and getting the expected
> speed?
>
> thanks to anyone who can verify this.
>
>
> On Sun, 2008-11-16 at 12:37 -0500, Darren Govoni wrote:
>> Yeah. That makes sense. Its not too hard to wrap those extra steps so I
>> can end up with something simpler too. Like:
>>
>> iindex = InstantiatedIndex("path/to/my/index")
>>
>> I'm lazy so the intermediate hoops to jump through clutter my code.
>> Hehe.
>>
>> :)
>>
>> Darren
>>
>> On Sun, 2008-11-16 at 11:46 -0500, Mark Miller wrote:
>> > Can you start with an empty index? Then how about:
>> >
>> > // Adding these
>> >
>> >     iindex = InstantiatedIndex()
>> >     ireader = iindex.indexReaderFactory()
>> >     isearcher = IndexSearcher(ireader)
>> >
>> > If you want a copy from another IndexReader though, you have to get that reader
from somewhere right?
>> >
>> > - Mark
>> >
>> >
>> >
>> > Darren Govoni wrote:
>> > > Hi Mark,
>> > >   Thanks for the tips. Here's what I will try (psuedo-code)
>> > >
>> > >     endirectory = RAMDirectory("index/dictionary.en")
>> > >     ensearcher = IndexSearcher(endirectory)
>> > >     // Adding these
>> > >     reader = ensearcher.getIndexReader()
>> > >     iindex = InstantiatedIndex(reader)
>> > >     ireader = iindex.indexReaderFactory()
>> > >     isearcher = IndexSearcher(ireader)
>> > >
>> > > Kind of round about way to get an InstantiatedIndex I guess,but maybe
>> > > there's a briefer way?
>> > >
>> > > Thank you.
>> > > Darren
>> > >
>> > > On Sun, 2008-11-16 at 10:50 -0500, Mark Miller wrote:
>> > >
>> > >> Check out the docs at:
>> > >> http://lucene.apache.org/java/2_4_0/api/contrib-instantiated/index.html
>> > >>
>> > >> There is a performance graph there to check  out.
>> > >>
>> > >> The code should be fairly straightforward - you can make an
>> > >> InstantiatedIndex thats empty, or seed it with an IndexReader. Then
you
>> > >> can make an InstantiatedReader or Writer, which take the
>> > >> InstantiatedIndex as a constructor arg.
>> > >>
>> > >> You should be able to just wrap that InstantiatedReader in a regular
>> > >> Searcher.
>> > >>
>> > >> Darren Govoni wrote:
>> > >>
>> > >>> Hi gang,
>> > >>>    I am trying to trace the 2.4 API to create an InstantiatedIndex,
but
>> > >>> its rather difficult to connect directory,reader,search,index etc
just
>> > >>> reading the javadocs.
>> > >>>
>> > >>>     I have a (POI - plain old index) directory already and want
to
>> > >>> create a faster InstantiatedIndex and IndexSearcher to query it
like
>> > >>> before. What's the proper order to do this?
>> > >>>
>> > >>> Also, if anyone has any empirical data on the performance or reliability
>> > >>> of InstantiatedIndex, I'd be curious.
>> > >>>
>> > >>> Thanks for the tips!
>> > >>> Darren
>> > >>>
>> > >>>
>> > >>> ---------------------------------------------------------------------
>> > >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > >>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> > >>>
>> > >>>
>> > >>>
>> > >> ---------------------------------------------------------------------
>> > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> > >>
>> > >>
>> > >
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > > For additional commands, e-mail: java-user-help@lucene.apache.org
>> > >
>> > >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message