lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bhaumik Joshi <bhaumik.jo...@outlook.com>
Subject Re: Passing Ids in query takes more time
Date Mon, 09 May 2016 04:32:37 GMT
Thanks Jeff. TermsQueryParser worked for me. 

Thanks & Regards,
Bhaumik Joshi

________________________________________
From: Jeff Wartes <jwartes@whitepages.com>
Sent: Thursday, May 5, 2016 8:19 AM
To: solr-user@lucene.apache.org
Subject: Re: Passing Ids in query takes more time

An ID lookup is a very simple and fast query, for one ID. Or’ing a lookup for 80k ids though
is basically 80k searches as far as Solr is concerned, so it’s not altogether surprising
that it takes a while. Your complaint seems to be that the query planner doesn’t know in
advance that <other criteria> should be run first, and then the id selection applied
to the reduced set.

So, I can think of a few things for you to look at, in no particular order:

1. TermsQueryParser is designed for lists of terms, you might get better results from that:
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser

2. If your <other criteria> is the real discriminating factor in your search, you could
just search for <other critera> and then apply your ID list as a PostFilter: http://yonik.com/advanced-filter-caching-in-solr/
I guess that’d look something like &fq={!terms f=<somefield> v="<id list”
cache=false cost=150}. You’d want cache=false because there’s not much sense caching an
id list unless that id list is usually the same, and the cost >= 100 should qualify it
as a post filter, which only operates on an already-found result set instead of the full index.
(Note: I haven’t confirmed that the Terms query parser supports post filtering.)

3. I’m not really aware of any storage engine that’ll love doing a filter on 80k ids at
once, but a key-value store like Cassandra might work out better for that.

4. There is a thing called a JoinQParserPlugin (https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-JoinQueryParser)
that can join to another collection (https://issues.apache.org/jira/browse/SOLR-4905). But
I’ve never used it, and there are some significant restrictions.




On 5/5/16, 2:46 AM, "Bhaumik Joshi" <bhaumik.joshi@outlook.com> wrote:

>Hi,
>
>
>I am retrieving ids from collection1 based on some query and passing those ids as a query
to collection2 so the query to collection2 which contains ids in it takes much more time compare
to normal query.
>
>
>Que. 1 - While passing ids to query why it takes more time compare to normal query however
we are narrowing the criteria by passing ids?
>
>e.g.  query-1: doc_id:(111 222 333 444 ...) AND <other criteria> slower (passing
80k ids takes 7-9 sec) than query-2: only <other criteria> (700-800 ms). Both returns
250 records with same set of fields.
>
>
>Que. 2 - Any idea on how i can achieve above (get ids from one collection and pass those
ids to other one) in efficient manner or any other way to get data from one collection based
on response of other collection?
>
>
>Thanks & Regards,
>
>Bhaumik Joshi
Mime
View raw message