lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Krupansky <jack.krupan...@gmail.com>
Subject Re: Solr Query Tuning
Date Fri, 15 Jan 2016 01:28:47 GMT
Add &debug=all to your query to see where the time is spent in the "timing"
section to see which Solr search component is consuming the time.

You may also have to add &debug=track to get the shard-specific info.

In theory, 19 of the shards should return nothing and the 20th will return
a single document.

Maybe one of the shard nodes is having trouble and takes way too long to do
essentially nothing.

Does the document ID have any special characters in it? If so, be sure to
escape them or put the ID in quotes, otherwise some piece of the ID may
match lots of documents, although even that should not be a big problem.

And make sure the ID field is string or numeric, not tokenized text.


-- Jack Krupansky

On Thu, Jan 14, 2016 at 7:53 PM, Shawn Heisey <apache@elyograg.org> wrote:

> On 1/14/2016 5:20 PM, Shivaji Dutta wrote:
> > I am working with a customer that has about a billion documents on 20
> shards. The documents are extremely small about 100 characters each.
> > The insert rate is pretty good, but they are trying to fetch the
> document by using SolrJ SolrQuery
> >
> > Solr Query is taking about 1 min to return.
> >
> > The query is very simple
> > id:<documentid>
> > Note the content of the document is just the documentid.
> >
> > Request for Information
> >
> > A) I am looking for some information as how I could go about tuning the
> query.
> > B) An alternate approach that I am thinking of is to use the "/get"
> request handler
> > Is this going to be faster than "/select"
> > C) I am looking at the debugQuery option, but I am unsure how to
> interpret this. I saw an slide share which talked about "
> http://explain.solr.pl/help", but it only supports older versions of solr.
>
> I have no idea whether /get would be faster.  You'd need to try it.
>
> Can you provide the SolrJ code that you are using to do the query?
> Another useful item would be the entire entry from the Solr logfile for
> this query.  There will probably be multiple log entries for one query,
> usually the relevant log entry is the last one in the series.  I may
> need the schema, but we'll decide that later.
>
> Are all 20 shards on the same server, or have you got them spread out
> across multiple machines?  What is the replicationFactor on the
> collection?  If there are multiple machines, how many shards live on
> each machine, and how many machines do you have total?  Do you happen to
> know how large the Lucene index is for each of these shards?  How much
> total memory does each server have, and how large is the Java heap?  Is
> there software other than Solr running on the machine(s)?
>
> I am suspecting that you don't have enough memory for the operating
> system to effectively cache your index.  Good performance for a billion
> documents is going to require a lot of memory and probably a lot of
> servers.
>
> https://wiki.apache.org/solr/SolrPerformanceProblems
>
> Thanks,
> Shawn
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message