lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sujit Pal <sujit....@comcast.net>
Subject Re: Quickest way to collect one field from the searched docs....
Date Fri, 19 Sep 2014 18:26:24 GMT
Hi Shouvik, not sure if you have already considered this, but you could put
the database primary key for the record into the index - ie, reverse your
insert to do DB first, get the record_id and then add this to the Lucene
index as "record_id" field. During retrieval you can minimize the network
traffic by setting field list to only this record_id.

-sujit


On Thu, Sep 18, 2014 at 9:23 PM, Shouvik Bardhan <sbardhan@gisfederal.com>
wrote:

> Pardon the length of the question. I have an index with 100 million docs
> (lucene not solr) and term queries (A*, A AND B* type queries) return
> pretty quickly (2 -4 secs) and I pick the lucene docIds up pretty quickly
> with a collector. This is good for us since we take the docIds and do
> further filtering based on another database we maintain whose record ids
> match with the stored lucene doc ids and we are able to do what we want. I
> know that depending on the lucene doc id value is not a good thing, since
> after delete/merge/optimize, the doc ids may change and if that was to
> happen, our other datastore will not line up with lucene doc index and
> chaps will ensue. Thus we do not optimize the index etc.
>
> My question is what is the fastest way I can gather 1 field value from the
> docs which are found to match the query? Is there any way to do this as
> fast as (or at least not much slower) I am able to collect the lucene
> docids?  I want to get away from depending on the "lucene docids not
> changing" if possible.
>
> Thanks for any suggestions.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message