lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "wolfgang hoschek (JIRA)" <>
Subject [jira] Commented: (LUCENE-550) InstanciatedIndex - faster but memory consuming index
Date Tue, 21 Nov 2006 18:20:04 GMT
    [ ] 
wolfgang hoschek commented on LUCENE-550:

What's the benchmark configuration? For example, is throughput bounded by indexing or querying?
 Measuring N queries against a single preindexed document vs. 1 precompiled query against
N documents? See the line

boolean measureIndexing = false; // toggle this to measure query performance

in my driver. If measuring indexing, what kind of analyzer / token filter chain is used? If
measuring queries, what kind of query types are in the mix, with which relative frequencies?

You may want to experiment with modifying/commenting/uncommenting various parts of the driver
setup, for any given target scenario. Would it be possible to post the benchmark code, test
data, queries for analysis?

> InstanciatedIndex - faster but memory consuming index
> -----------------------------------------------------
>                 Key: LUCENE-550
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Store
>    Affects Versions: 1.9
>            Reporter: Karl Wettin
>         Attachments: class_diagram.png, class_diagram.png, instanciated_20060527.tar,, lucene.1.9-karl1.jpg, lucene2-karl_20060722.tar.gz, lucene2-karl_20060723.tar.gz
> After fixing the bugs, it's now 4.5 -> 5 times the speed. This is true for both at
index and query time. Sorry if I got your hopes up too much. There are still things to be
done though. Might not have time to do anything with this until next month, so here is the
code if anyone wants a peek.
> Not good enough for Jira yet, but if someone wants to fool around with it, here it is.
The implementation passes a TermEnum -> TermDocs -> Fields -> TermVector comparation
against the same data in a Directory.
> When it comes to features, offsets don't exists and positions are stored ugly and has
> You might notice that norms are float[] and not byte[]. That is me who refactored it
to see if it would do any good. Bit shifting don't take many ticks, so I might just revert
> I belive the code is quite self explaining.
> InstanciatedIndex ii = ..
> InstanciatedIndexReader();
> ii.addDocument(s).. replace IndexWriter for now.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message