lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Turnbull <dturnb...@opensourceconnections.com>
Subject Re: Solr Query Tuning
Date Fri, 15 Jan 2016 01:43:57 GMT
I suppose that /get is the query by id API. I wonder if its reasonable to
expect it to be smart in SolrCloud usage.

On Thursday, January 14, 2016, Doug Turnbull <
dturnbull@opensourceconnections.com> wrote:

> Stupid thought/question. Is there a query by id API that understands
> SolrCloud routing and can simply fwd the query to the shard that would hold
> said document? Barring that, can one use SolrJ's routing brains to see what
> shard a given id would be routed to and only query that shard?
>
> -Doug
>
> On Thursday, January 14, 2016, Jack Krupansky <jack.krupansky@gmail.com
> <javascript:_e(%7B%7D,'cvml','jack.krupansky@gmail.com');>> wrote:
>
>> Add &debug=all to your query to see where the time is spent in the
>> "timing"
>> section to see which Solr search component is consuming the time.
>>
>> You may also have to add &debug=track to get the shard-specific info.
>>
>> In theory, 19 of the shards should return nothing and the 20th will return
>> a single document.
>>
>> Maybe one of the shard nodes is having trouble and takes way too long to
>> do
>> essentially nothing.
>>
>> Does the document ID have any special characters in it? If so, be sure to
>> escape them or put the ID in quotes, otherwise some piece of the ID may
>> match lots of documents, although even that should not be a big problem.
>>
>> And make sure the ID field is string or numeric, not tokenized text.
>>
>>
>> -- Jack Krupansky
>>
>> On Thu, Jan 14, 2016 at 7:53 PM, Shawn Heisey <apache@elyograg.org>
>> wrote:
>>
>> > On 1/14/2016 5:20 PM, Shivaji Dutta wrote:
>> > > I am working with a customer that has about a billion documents on 20
>> > shards. The documents are extremely small about 100 characters each.
>> > > The insert rate is pretty good, but they are trying to fetch the
>> > document by using SolrJ SolrQuery
>> > >
>> > > Solr Query is taking about 1 min to return.
>> > >
>> > > The query is very simple
>> > > id:<documentid>
>> > > Note the content of the document is just the documentid.
>> > >
>> > > Request for Information
>> > >
>> > > A) I am looking for some information as how I could go about tuning
>> the
>> > query.
>> > > B) An alternate approach that I am thinking of is to use the "/get"
>> > request handler
>> > > Is this going to be faster than "/select"
>> > > C) I am looking at the debugQuery option, but I am unsure how to
>> > interpret this. I saw an slide share which talked about "
>> > http://explain.solr.pl/help", but it only supports older versions of
>> solr.
>> >
>> > I have no idea whether /get would be faster.  You'd need to try it.
>> >
>> > Can you provide the SolrJ code that you are using to do the query?
>> > Another useful item would be the entire entry from the Solr logfile for
>> > this query.  There will probably be multiple log entries for one query,
>> > usually the relevant log entry is the last one in the series.  I may
>> > need the schema, but we'll decide that later.
>> >
>> > Are all 20 shards on the same server, or have you got them spread out
>> > across multiple machines?  What is the replicationFactor on the
>> > collection?  If there are multiple machines, how many shards live on
>> > each machine, and how many machines do you have total?  Do you happen to
>> > know how large the Lucene index is for each of these shards?  How much
>> > total memory does each server have, and how large is the Java heap?  Is
>> > there software other than Solr running on the machine(s)?
>> >
>> > I am suspecting that you don't have enough memory for the operating
>> > system to effectively cache your index.  Good performance for a billion
>> > documents is going to require a lot of memory and probably a lot of
>> > servers.
>> >
>> > https://wiki.apache.org/solr/SolrPerformanceProblems
>> >
>> > Thanks,
>> > Shawn
>> >
>> >
>>
>
>
> --
> *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections
> <http://opensourceconnections.com>, LLC | 240.476.9983
> Author: Relevant Search <http://manning.com/turnbull>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless
> of whether attachments are marked as such.
>
>

-- 
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections
<http://opensourceconnections.com>, LLC | 240.476.9983
Author: Relevant Search <http://manning.com/turnbull>
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message