lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Sokolov <msoko...@safaribooksonline.com>
Subject Re: distributed search is significantly slower than direct search
Date Sat, 16 Nov 2013 12:39:11 GMT
Did you say what the memory profile of your machine is?  How much 
memory, and how large are the shards? This is just a random guess, but 
it might be that if you are memory-constrained, there is a lot of 
thrashing caused by paging (swapping?) in and out the sharded indexes 
while a single index can be scanned linearly, even if it does need to be 
paged in.

-Mike

On 11/14/2013 8:10 AM, Elran Dvir wrote:
> Hi,
>
> We tried returning just the id field and got exactly the same performance.
> Our system is distributed but all shards are in a single machine so network issues are
not a factor.
> The code we found where Solr is spending its time is on the shard and not on the routing
core, again all shards are local.
> We investigated the getFirstMatch() method and noticed that the MultiTermEnum.reset (inside
MultiTerm.iterator) and MultiTerm.seekExact take 99% of the time.
> Inside these methods, the call to BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock
 takes most of the time.
> Out of the 7 seconds  run these methods take ~5 and BinaryResponseWriter.write takes
the rest(~ 2 seconds).
>
> We tried increasing cache sizes and got hits, but it only improved the query time by
a second (~6), so no major effect.
> We are not indexing during our tests. The performance is similar.
> (How do we measure doc size? Is it important due to the fact that the performance is
the same when returning only id field?)
>
> We still don't completely understand why the query takes this much longer although the
cores are on the same machine.
>
> Is there a way to improve the performance (code, configuration, query)?
>
> -----Original Message-----
> From: idokissos@gmail.com [mailto:idokissos@gmail.com] On Behalf Of Manuel Le Normand
> Sent: Thursday, November 14, 2013 1:30 AM
> To: solr-user@lucene.apache.org
> Subject: Re: distributed search is significantly slower than direct search
>
> It's surprising such a query takes a long time, I would assume that after trying consistently
q=*:* you should be getting cache hits and times should be faster. Try see in the adminUI
how do your query/doc cache perform.
> Moreover, the query in itself is just asking the first 5000 docs that were indexed (returing
the first [docid]), so seems all this time is wasted on transfer. Out of these 7 secs how
much is spent on the above method? What do you return by default? How big is every doc you
display in your results?
> Might be the matter that both collections work on the same ressources. Try elaborating
your use-case.
>
> Anyway, it seems like you just made a test to see what will be the performance hit in
a distributed environment so I'll try to explain some things we encountered in our benchmarks,
with a case that has at least the similarity of the num of docs fetched.
>
> We reclaim 2000 docs every query, running over 40 shards. This means every shard is actually
transfering to our frontend 2000 docs every document-match request (the first you were referring
to). Even if lazily loaded, reading 2000 id's (on 40 servers) and lazy loading the fields
is a tough job. Waiting for the slowest shard to respond, then sorting the docs and reloading
(lazy or not) the top 2000 docs might take a long time.
>
> Our times are 4-8 secs, but do it's not possible comparing cases. We've done few steps
that improved it along the way, steps that led to others.
> These were our starters:
>
>     1. Profile these queries from different servers and solr instances, try
>     putting your finger what collection is working hard and why. Check if
>     you're stuck on components that don't have an added value for you but are
>     used by default.
>     2. Consider eliminating the doc cache. It loads lots of (partly) lazy
>     documents that their probability of secondary usage is low. There's no such
>     thing "popular docs" when requesting so many docs. You may be using your
>     memory in a better way.
>     3. Bottleneck check - inner server metrics as cpu user / iowait, packets
>     transferred over the network, page faults etc. are excellent in order to
>     understand if the disk/network/cpu is slowing you down. Then upgrade
>     hardware in one of the shards to check if it helps by looking at the
>     upgraded shard qTime compared to other.
>     4. Warm up the index after commiting - try to benchmark how do queries
>     performs before and after some warm-up, let's say some few hundreds of
>     queries (from your previous system) in order to warm up the os cache
>     (assuming your using NRTDirectoryFactory)
>
>
> Good luck,
> Manu
>
>
> On Wed, Nov 13, 2013 at 2:38 PM, Erick Erickson <erickerickson@gmail.com>wrote:
>
>> One thing you can try, and this is more diagnostic than a cure, is
>> return just the id field (and insure that lazy field loading is true).
>> That'll tell you whether the issue is actually fetching the document
>> off disk and decompressing, although frankly that's unlikely since you
>> can get your 5,000 rows from a single machine quickly.
>>
>> The code you found where Solr is spending its time, is that on the
>> "routing" core or on the shards? I actually have a hard time
>> understanding how that code could take a long time, doesn't seem
>> right.
>>
>> You are transferring 5,000 docs across the network, so it's possible
>> that your network is just slow, that's certainly a difference between
>> the local and remote case, but that's a stab in the dark.
>>
>> Not much help I know,
>> Erick
>>
>>
>>
>> On Wed, Nov 13, 2013 at 2:52 AM, Elran Dvir <elrand@checkpoint.com> wrote:
>>
>>> Erick, Thanks for your response.
>>>
>>> We are upgrading our system using Solr.
>>> We need to preserve old functionality.  Our client displays 5K
>>> document and groups them.
>>>
>>> Is there a way to refactor code in order to improve distributed
>>> documents fetching?
>>>
>>> Thanks.
>>>
>>> -----Original Message-----
>>> From: Erick Erickson [mailto:erickerickson@gmail.com]
>>> Sent: Wednesday, October 30, 2013 3:17 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: distributed search is significantly slower than direct
>> search
>>> You can't. There will inevitably be some overhead in the distributed
>> case.
>>> That said, 7 seconds is quite long.
>>>
>>> 5,000 rows is excessive, and probably where your issue is. You're
>>> having to go out and fetch the docs across the wire. Perhaps there
>>> is some batching that could be done there, I don't know whether this
>>> is one document per request or not.
>>>
>>> Why 5K docs?
>>>
>>> Best,
>>> Erick
>>>
>>>
>>> On Tue, Oct 29, 2013 at 2:54 AM, Elran Dvir <elrand@checkpoint.com>
>> wrote:
>>>> Hi all,
>>>>
>>>> I am using Solr 4.4 with multi cores. One core (called template)
>>>> is my "routing" core.
>>>>
>>>> When I run
>>>> http://127.0.0.1:8983/solr/template/select?rows=5000&q=*:*&shards=127.
>>>> 0.0.1:8983/solr/core1,
>>>> it consistently takes about 7s.
>>>> When I run
>>>> http://127.0.0.1:8983/solr/core1/select?rows=5000&q=*:*, it consistently
takes about 40ms.
>>>>
>>>> I profiled the distributed query.
>>>> This is the distributed query process (I hope the terms are accurate):
>>>> When solr identifies a distributed query, it sends the query to
>>>> the shard and get matched shard docs.
>>>> Then it sends another query to the shard to get the Solr documents.
>>>> Most time is spent in the last stage in the function "process" of
>>>> "QueryComponent" in:
>>>>
>>>> for (int i=0; i<idArr.size(); i++) {
>>>>          int id = req.getSearcher().getFirstMatch(
>>>>                  new Term(idField.getName(),
>>>> idField.getType().toInternal(idArr.get(i))));
>>>>
>>>> How can I make my distributed query as fast as the direct one?
>>>>
>>>> Thanks.
>>>>
>>>
>>> Email secured by Check Point
>>>
>
> Email secured by Check Point


Mime
View raw message