lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Result merging takes too long
Date Sat, 15 Mar 2014 23:29:36 GMT
I wouldn't expect the merge times to be significant
at all, _assuming_ you're not doing something like
setting a very high &start= parameter or returning
a whole of rows.

Now, it may be that you're sharding with too small
a document set to really notice a difference.
Sharding isn't really about speeding up responses,
as it is being able to handle very large indexes.

So I have to ask what the end goal is here. Are
your response times really in need of improvement
or is this more trying to understand the process?

Best,
Erick

On Thu, Mar 13, 2014 at 1:19 AM, remi tassing <tassingremi@gmail.com> wrote:
> Hi Erick,
>
> I've used the fl=id parameter to avoid retrieving the actual documents
> (step <4> in your mail) but the problem still exists.
> Any ideas on how to find the merging time(step <3>)?
>
> Remi
>
>
> On Tue, Mar 11, 2014 at 7:29 PM, Erick Erickson <erickerickson@gmail.com>wrote:
>
>> In SolrCloud there are a couple of round trips
>> that _may_ be what you're seeing.
>>
>> First, though, the QTime is the time spent
>> querying, it does NOT include assembling
>> the documents from disk for return etc., so
>> bear that in mind....
>>
>> But here's the sequence as I understand it
>> from the receiving node's viewpoint.
>> 1> send the query out to one replica for
>> each shard
>> 2> get the top N doc IDs and scores (
>> or whatever sorting criteria) from each
>> shard.
>> 3> Merge the lists and select the top N
>> to return
>> 4> request the actual documents for
>> the top N list from each of the shards
>> 5> return the list.
>>
>> So as you can see, there's an extra
>> round trip to each shard to get the
>> full document. Perhaps this is what
>> you're seeing? <4> seems like it
>> might be what you're seeing, I don't
>> think it's counted in QTime.
>>
>> HTH
>> Erick
>>
>> On Tue, Mar 11, 2014 at 3:17 AM, remi tassing <tassingremi@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I've just setup a SolrCloud with Tomcat. 5 Shards with one replication
>> each
>> > and total 10million docs (evenly distributed).
>> >
>> > I've noticed the query response time is faster than using one single node
>> > but still not as fast as I expected.
>> >
>> > After turning debugQuery on, I noticed the query time is different to the
>> > value returned in the debug explanation (see some excerpt below). More
>> > importantly, while making a query to one, and only one, shard then the
>> > result is consistent. It appears the server spends most of its time doing
>> > result aggregation (merging).
>> >
>> > After searching on Google in vain I didn't find anything concrete except
>> > that the problem could be in 'SearchComponent'.
>> >
>> > Could you point me in the right direction (e.g. configuration...)?
>> >
>> > Thanks!
>> >
>> > Remi
>> >
>> > Solr Cloud result:
>> >
>> > <lst name="responseHeader">
>> >
>> > <int name="status">0</int>
>> >
>> > <int name="QTime">3471</int>
>> >
>> > <lst name="params">
>> >
>> > <str name="debugQuery">on</str>
>> >
>> > <str name="q">project development agile</str>
>> >
>> > </lst>
>> >
>> > </lst>
>> >
>> > <result name="response" numFound="2762803" start="0"
>> > maxScore="0.17022902">...</result>
>> >
>> > ...
>> >
>> >
>> >
>> > <lst name="timing">
>> >
>> > <double name="time">508.0</double>
>> >
>> > <lst name="prepare">
>> >
>> > <double name="time">8.0</double>
>> >
>> > <lst name="query">
>> >
>> > <double name="time">8.0</double>
>> >
>> > </lst>
>> >
>> > <lst name="facet">
>> >
>> > <double name="time">0.0</double>
>> >
>> > </lst>
>> >
>> > <lst name="mlt">
>> >
>> > <double name="time">0.0</double>
>> >
>> > </lst>
>> >
>> > <lst name="highlight">
>> >
>> > <double name="time">0.0</double>
>> >
>> > </lst>
>> >
>> > <lst name="stats">
>> >
>> > <double name="time">0.0</double>
>> >
>> > </lst>
>> >
>> > <lst name="debug">
>> >
>> > <double name="time">0.0</double>
>> >
>> > </lst>
>> >
>> > </lst>
>> >
>> > <lst name="process">
>> >
>> > <double name="time">499.0</double>
>> >
>> > <lst name="query">
>> >
>> > <double name="time">195.0</double>
>> >
>> > </lst>
>> >
>> > <lst name="facet">
>> >
>> > <double name="time">0.0</double>
>> >
>> > </lst>
>> >
>> > <lst name="mlt">
>> >
>> > <double name="time">0.0</double>
>> >
>> > </lst>
>> >
>> > <lst name="highlight">
>> >
>> > <double name="time">228.0</double>
>> >
>> > </lst>
>> >
>> > <lst name="stats">
>> >
>> > <double name="time">0.0</double>
>> >
>> > </lst>
>> >
>> > <lst name="debug">
>> >
>> > <double name="time">76.0</double>
>> >
>> > </lst>
>> >
>> > </lst>
>> >
>> > </lst>
>>

Mime
View raw message