lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Earwin Burrfoot <ear...@gmail.com>
Subject Re: possible TermInfosReader speedup
Date Wed, 08 Apr 2009 21:08:27 GMT
On Thu, Apr 9, 2009 at 00:14, Michael McCandless
<lucene@mikemccandless.com> wrote:
> On Wed, Apr 8, 2009 at 3:46 PM, Earwin Burrfoot <earwin@gmail.com> wrote:
>
>> Currently, when we're seeking a given Term, it does a binary search
>> across all term space, including terms belonging to other fields.
>> I propose augmenting fields file with two pointers (firstTerm,
>> lastTerm) for each field. That reduces range we need to search, and
>> instead of comparing Terms we only need to compare values.
>> How does that sound?
> That sounds great!  Wanna make a patch?
Can try. But I'm not at all comfortable with these parts of Lucene,
will probably need help, at least with tests.

>> Also, on the other topic - how hard is it to boost
>> TermEnum.skipTo(term) speed to IndexReader.terms(term) level? Would be
>> nice for TrieRangeFilter and probably some other filters.
> I think all that's needed is to implement SegmentTermEnum.skipTo,
> calling something like tis.terms(Term) but instead of returning a
> cloned SegmentTermEnum, overwrite the one passed in?
I bet at least MultiSegmentReader.MultiTermEnum should be affected
too? (I'm looking at 2.3.2 sources)

> Does TrieRangeFilter use TermEnum.skipTo?  If so, we should certainly fix this.
It doesn't, but only because skipTo is so obviously slow + I have
another filter in my project that could use skipTo.

Refer to: https://issues.apache.org/jira/browse/LUCENE-1470?focusedCommentId=12651318&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12651318
Uwe> I am fine with calling IndexReader.terms(Term) to use the cache
and faster seeking. The cost of creating new instances of TermEnums is
less than doing disk reads.
But other people (like me) might use mmapped indexes, so cost(new
TermEnum)/cost(index read) relation looks different for us.

> See also this, for historical context:
>  http://markmail.org/message/2e7kpvyi3bqtgjwt#query:lucene%20termenum%20skipto+page:1+mid:lb46mbbgpgbnnuxk+state:results
Darn! And api-wise it looks like a legitimate method :)

-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message