lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <>
Subject Re: Solr Query Tuning
Date Fri, 15 Jan 2016 00:53:13 GMT
On 1/14/2016 5:20 PM, Shivaji Dutta wrote:
> I am working with a customer that has about a billion documents on 20 shards. The documents
are extremely small about 100 characters each.
> The insert rate is pretty good, but they are trying to fetch the document by using SolrJ
> Solr Query is taking about 1 min to return.
> The query is very simple
> id:<documentid>
> Note the content of the document is just the documentid.
> Request for Information
> A) I am looking for some information as how I could go about tuning the query.
> B) An alternate approach that I am thinking of is to use the "/get" request handler
> Is this going to be faster than "/select"
> C) I am looking at the debugQuery option, but I am unsure how to interpret this. I saw
an slide share which talked about "", but it only supports older
versions of solr.

I have no idea whether /get would be faster.  You'd need to try it.

Can you provide the SolrJ code that you are using to do the query? 
Another useful item would be the entire entry from the Solr logfile for
this query.  There will probably be multiple log entries for one query,
usually the relevant log entry is the last one in the series.  I may
need the schema, but we'll decide that later.

Are all 20 shards on the same server, or have you got them spread out
across multiple machines?  What is the replicationFactor on the
collection?  If there are multiple machines, how many shards live on
each machine, and how many machines do you have total?  Do you happen to
know how large the Lucene index is for each of these shards?  How much
total memory does each server have, and how large is the Java heap?  Is
there software other than Solr running on the machine(s)?

I am suspecting that you don't have enough memory for the operating
system to effectively cache your index.  Good performance for a billion
documents is going to require a lot of memory and probably a lot of servers.


View raw message