lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From karl wettin <ka...@snigel.net>
Subject Re: Using Lucene for searching tokens, not storing them.
Date Thu, 20 Apr 2006 05:29:56 GMT

18 apr 2006 kl. 22.08 skrev karl wettin:

> After adding a couple of binary searches in well needed places (and  
> a couple of new bugs that in a few cases affects the results) I'm  
> now down at 1/8th of the time compared to RAMDirectory. That is  
> really fast if you ask me.

After fixing the bugs, it's now 4.5 -> 5 times the speed. This is  
true for both at index and query time. Sorry if I got your hopes up  
too much. There are still things to be done though. Might not have  
time to do anything with this until next month, so here is the code  
if anyone wants a peek.

Not good enough for Jira yet, but if someone wants to fool around  
with it, here it is. The implementation passes a TermEnum -> TermDocs  
-> Fields -> TermVector comparation against the same data in a  
Directory.

When it comes to features, offsets don't exists and positions are  
stored ugly and has bugs.

You might notice that norms are float[] and not byte[]. That is me  
who refactored it to see if it would do any good. Bit shifting don't  
take many ticks, so I might just revert that.

I belive the code is quite self explaining.

InstanciatedIndex ii = ..
ii.new InstanciatedIndexReader();
ii.addDocument(s).. replace IndexWriter for now.




Mime
View raw message