lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <simon.willna...@googlemail.com>
Subject Re: Proper use of TermsEnum.seek?
Date Fri, 25 Feb 2011 12:57:23 GMT
On Fri, Feb 25, 2011 at 1:28 PM, Toke Eskildsen <te@statsbiblioteket.dk> wrote:
> On Tue, 2011-02-22 at 12:19 +0100, Simon Willnauer wrote:
>
> [Toke: Using a partial cache of BytesRef+TermState]
>
>> I don't know how you did implement that part but you might consider
>> using something like ByteBlockPool instead of BytesRef instances to
>> safe an extra amount of memory. Just as a hint you can look at
>> BytesRefHash for an example.
>
> Avoiding the overhead of representing the BytesRefs as separate Objects
> seems sensible. Unfortunately this isn't possible with TermState, at
> least not in general. As I focus quite a bit on memory overhead, tt
> might make sense to just store the BytesRef and take the performance
> penalty of seek(BytesRef) to avoid the Object-overhead of TermStats.
>
>> I think we need to check if that BytesRef is really needed. I hope we
>> can get rid of it eventually.
>
> It does seem a bit peculiar that is is needed for a seek using a
> previously delivered marker. Maybe the TermState could hold a reference
> to the BytesRef itself, if it is needed by the implementation?

Yeah I agree, but it seems that we still need it for all the impls
right now. I did not look at the block reader in very detail but it
still seems to be necessary somehow....

I opened https://issues.apache.org/jira/browse/LUCENE-2938 for this!


>
> Regards,
> Toke Eskildsen
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message