lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Trejkaz <trej...@trypticon.org>
Subject Re: Mapping doc values back to doc ID (in decent time)
Date Sun, 09 Aug 2015 07:41:35 GMT
On Fri, Aug 7, 2015 at 5:34 PM, Adrien Grand <jpountz@gmail.com> wrote:
> Does your application actually iterate in order over dense ids, or is
> it just for benchmarking purposes? Because if it does, you probably
> don't actually need seeking, you could just see what the current ID in
> the terms enum is.

Both dense ID fetches and individual ID fetches exist in the
application. I put them in a benchmark deliberately doing it as
individual fetches to get an idea of average timing for a single
operation.

There are so many use cases of doing the individual fetches that it's
tough to enumerate. The first one I found was "fetch the term vector
for ID + field" but I'm sure there will be tons of them.

For mapping a dense set of IDs to doc IDs (e.g. for filtering), I
would probably use something like DocValuesTermsQuery for that to get
them all in one shot. I also wondered whether writing our filters as
queries would help, but I think it would turn out to be about as fast
as DocValuesTermsQuery even if I did that.

I'm sure the only way to really improve the speed of these filters is
to start storing these things in the text index and use query-time
joins, but I can't do that until I solve the issue of relying on
stable doc IDs and it seems like trying to solve two large problems in
a single commit would be biting off more than I can chew.

> If you actually need seeking, then you should try
> to avoid MultiFields, it will call seedExact on each segment, while
> given what I see you could just stop after you found one segment with
> the value.

Ah, I did wonder whether MultiFields had any behaviour like that, so
that definitely means that I will avoid using it. Then I can try other
tricks, like trying the seeks in order of segment size (the largest
segment is most likely to contain the hit.)

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message