lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Earwin Burrfoot <ear...@gmail.com>
Subject Re: possible TermInfosReader speedup
Date Wed, 08 Apr 2009 22:26:21 GMT
On Thu, Apr 9, 2009 at 02:01, Uwe Schindler <uwe@thetaphi.de> wrote:
>> >> Also, on the other topic - how hard is it to boost
>> >> TermEnum.skipTo(term) speed to IndexReader.terms(term) level? Would be
>> >> nice for TrieRangeFilter and probably some other filters.
>> > I think all that's needed is to implement SegmentTermEnum.skipTo,
>> > calling something like tis.terms(Term) but instead of returning a
>> > cloned SegmentTermEnum, overwrite the one passed in?
>> I bet at least MultiSegmentReader.MultiTermEnum should be affected
>> too? (I'm looking at 2.3.2 sources)
>>
>> > Does TrieRangeFilter use TermEnum.skipTo?  If so, we should certainly
>> fix this.
>> It doesn't, but only because skipTo is so obviously slow + I have
>> another filter in my project that could use skipTo.
>>
>> Refer to: https://issues.apache.org/jira/browse/LUCENE-
>> 1470?focusedCommentId=12651318&page=com.atlassian.jira.plugin.system.issue
>> tabpanels%3Acomment-tabpanel#action_12651318
>> Uwe> I am fine with calling IndexReader.terms(Term) to use the cache
>> and faster seeking. The cost of creating new instances of TermEnums is
>> less than doing disk reads.
>
> I am fascinated; you remember my question... :-)
I don't, I retired from that issue comments earlier :)
But today I was borrowing parts of your code for my version of
rangefilter (which we discussed at the very beginning) and stumbled
upon obviously missed skipTo opportunity. Then I checked the
mailing-list and found there your supporting voice.

> Yes, if seekTo would work more performant, I could easily use it in
> TrieRange and would be happy as noted before. Currently, a new TermEnum is
> created on each sub-range. When TrieRange was committed and therefore
> updated, for me it was (and still is) not clear, why skipTo may not be as
> fast as a new TermEnum.
Check Michael's link below, this method (and its ugly implementation)
is a random offspring of some ancient bugfix. Nobody loved it, and it
grew in neglect.

>> But other people (like me) might use mmapped indexes, so cost(new
>> TermEnum)/cost(index read) relation looks different for us.
>>
>> > See also this, for historical context:
>> >
>>  http://markmail.org/message/2e7kpvyi3bqtgjwt#query:lucene%20termenum%20sk
>> ipto+page:1+mid:lb46mbbgpgbnnuxk+state:results
>> Darn! And api-wise it looks like a legitimate method :)
>
> Uwe
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>



-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message