lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wettin (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-550) InstanciatedIndex - faster but memory consuming index
Date Tue, 21 Nov 2006 18:06:04 GMT
    [ http://issues.apache.org/jira/browse/LUCENE-550?page=comments#action_12451726 ] 
            
Karl Wettin commented on LUCENE-550:
------------------------------------

Here is what I just sent to Wolgang. I've adapted his bench test case to also work with InstantiatedIndex.
It is worth noticing this is a test with one document only, and the speed is not linear according
to my previous tests. InstantiatedIndex is much more than 3x faster than RAMDirectory in a
larger index. So this is really only to compare MemoryIndex with InstantiatedIndex, and not
as a bench against RAMDirectory.

RAMDirectory:

secs = 95.159
queries/sec= 315.26184
MB/sec = 9.900338
Done benchmarking (without checking correctness).


MemoryIndex:

secs = 26.692
queries/sec= 1123.9323
MB/sec = 35.295456
Done benchmarking (without checking correctness).



InstantiatedIndex:

secs = 27.44
queries/sec= 1093.2944
MB/sec = 34.333317
Done benchmarking (without checking correctness).


MemoryIndex is a bit faster than InstantiatedIndex. But I'm aware of a couple of small optimizations
I can do. 

> InstanciatedIndex - faster but memory consuming index
> -----------------------------------------------------
>
>                 Key: LUCENE-550
>                 URL: http://issues.apache.org/jira/browse/LUCENE-550
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Store
>    Affects Versions: 1.9
>            Reporter: Karl Wettin
>         Attachments: class_diagram.png, class_diagram.png, instanciated_20060527.tar,
InstanciatedIndexTermEnum.java, lucene.1.9-karl1.jpg, lucene2-karl_20060722.tar.gz, lucene2-karl_20060723.tar.gz
>
>
> After fixing the bugs, it's now 4.5 -> 5 times the speed. This is true for both at
index and query time. Sorry if I got your hopes up too much. There are still things to be
done though. Might not have time to do anything with this until next month, so here is the
code if anyone wants a peek.
> Not good enough for Jira yet, but if someone wants to fool around with it, here it is.
The implementation passes a TermEnum -> TermDocs -> Fields -> TermVector comparation
against the same data in a Directory.
> When it comes to features, offsets don't exists and positions are stored ugly and has
bugs.
> You might notice that norms are float[] and not byte[]. That is me who refactored it
to see if it would do any good. Bit shifting don't take many ticks, so I might just revert
that.
> I belive the code is quite self explaining.
> InstanciatedIndex ii = ..
> ii.new InstanciatedIndexReader();
> ii.addDocument(s).. replace IndexWriter for now.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message