lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Trejkaz <trej...@trypticon.org>
Subject Re: Recommendation for doing a search plus collecting extra information?
Date Tue, 27 Mar 2018 05:19:55 GMT
On Mon, Oct 12, 2015 at 4:32 AM, Alan Woodward <alan@flax.co.uk> wrote:
> Hi Trejkaz,
>
> You can still use a standard collector if you don’t need to worry about multi-threaded
search.
> It sounds as though what you want to do is implement your own Collector that will read
and
> record docvalues hits, and use MultiCollector to wrap it and a standard TopDocsCollector
together.

This is what I'm currently trying out, but I'm hitting exactly the
problem I predicted. To use the values, I have to put them into some
kind of storage.

I can put them into an int[] but then it's the worst case memory usage
for queries returning a small number of hits.

Or I can put them into something like a fastutil Int2IntOpenHashMap,
which reduces the memory usage for small queries, while also making
large queries much slower.

Neither of these is really appealing right now.

Two ideas but I can't figure out if they'll work:

1. The doc IDs are visited in order, at least within each segment. Is
there a structure in Lucene itself somewhere which can store that off
quickly and efficiently?

2. Am I allowed to just hold onto the NumericDocValues for each leaf
and hold onto them for a long period of time, or is there an
implementation of them which breaks that? I figure it's already
sitting around, so that should be zero additional storage?

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message