lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From András Péteri <>
Subject Re: Mapping doc values back to doc ID (in decent time)
Date Sun, 09 Aug 2015 15:51:54 GMT
If I understand it correctly, the Zoie library [1][2] implements the
"sledgehammer" approach by collecting docValues for all documents when a
segment reader is opened. If you have some RAM to throw at the problem,
this could indeed bring you an acceptable level of performance.


On Sun, Aug 9, 2015 at 9:41 AM, Trejkaz <> wrote:

> On Fri, Aug 7, 2015 at 5:34 PM, Adrien Grand <> wrote:
> > Does your application actually iterate in order over dense ids, or is
> > it just for benchmarking purposes? Because if it does, you probably
> > don't actually need seeking, you could just see what the current ID in
> > the terms enum is.
> Both dense ID fetches and individual ID fetches exist in the
> application. I put them in a benchmark deliberately doing it as
> individual fetches to get an idea of average timing for a single
> operation.
> There are so many use cases of doing the individual fetches that it's
> tough to enumerate. The first one I found was "fetch the term vector
> for ID + field" but I'm sure there will be tons of them.
> For mapping a dense set of IDs to doc IDs (e.g. for filtering), I
> would probably use something like DocValuesTermsQuery for that to get
> them all in one shot. I also wondered whether writing our filters as
> queries would help, but I think it would turn out to be about as fast
> as DocValuesTermsQuery even if I did that.
> I'm sure the only way to really improve the speed of these filters is
> to start storing these things in the text index and use query-time
> joins, but I can't do that until I solve the issue of relying on
> stable doc IDs and it seems like trying to solve two large problems in
> a single commit would be biting off more than I can chew.
> > If you actually need seeking, then you should try
> > to avoid MultiFields, it will call seedExact on each segment, while
> > given what I see you could just stop after you found one segment with
> > the value.
> Ah, I did wonder whether MultiFields had any behaviour like that, so
> that definitely means that I will avoid using it. Then I can try other
> tricks, like trying the seeks in order of segment size (the largest
> segment is most likely to contain the hit.)
> TX
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message