lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: distributed search is significantly slower than direct search
Date Wed, 13 Nov 2013 12:38:47 GMT
One thing you can try, and this is more diagnostic than a cure, is return
just
the id field (and insure that lazy field loading is true). That'll tell you
whether
the issue is actually fetching the document off disk and decompressing,
although
frankly that's unlikely since you can get your 5,000 rows from a single
machine
quickly.

The code you found where Solr is spending its time, is that on the
"routing" core
or on the shards? I actually have a hard time understanding how that
code could take a long time, doesn't seem right.

You are transferring 5,000 docs across the network, so it's possible that
your network is just slow, that's certainly a difference between the local
and remote case, but that's a stab in the dark.

Not much help I know,
Erick



On Wed, Nov 13, 2013 at 2:52 AM, Elran Dvir <elrand@checkpoint.com> wrote:

> Erick, Thanks for your response.
>
> We are upgrading our system using Solr.
> We need to preserve old functionality.  Our client displays 5K document
> and groups them.
>
> Is there a way to refactor code in order to improve distributed documents
> fetching?
>
> Thanks.
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Wednesday, October 30, 2013 3:17 AM
> To: solr-user@lucene.apache.org
> Subject: Re: distributed search is significantly slower than direct search
>
> You can't. There will inevitably be some overhead in the distributed case.
> That said, 7 seconds is quite long.
>
> 5,000 rows is excessive, and probably where your issue is. You're having
> to go out and fetch the docs across the wire. Perhaps there is some
> batching that could be done there, I don't know whether this is one
> document per request or not.
>
> Why 5K docs?
>
> Best,
> Erick
>
>
> On Tue, Oct 29, 2013 at 2:54 AM, Elran Dvir <elrand@checkpoint.com> wrote:
>
> > Hi all,
> >
> > I am using Solr 4.4 with multi cores. One core (called template) is my
> > "routing" core.
> >
> > When I run
> > http://127.0.0.1:8983/solr/template/select?rows=5000&q=*:*&shards=127.
> > 0.0.1:8983/solr/core1,
> > it consistently takes about 7s.
> > When I run http://127.0.0.1:8983/solr/core1/select?rows=5000&q=*:*, it
> > consistently takes about 40ms.
> >
> > I profiled the distributed query.
> > This is the distributed query process (I hope the terms are accurate):
> > When solr identifies a distributed query, it sends the query to the
> > shard and get matched shard docs.
> > Then it sends another query to the shard to get the Solr documents.
> > Most time is spent in the last stage in the function "process" of
> > "QueryComponent" in:
> >
> > for (int i=0; i<idArr.size(); i++) {
> >         int id = req.getSearcher().getFirstMatch(
> >                 new Term(idField.getName(),
> > idField.getType().toInternal(idArr.get(i))));
> >
> > How can I make my distributed query as fast as the direct one?
> >
> > Thanks.
> >
>
>
> Email secured by Check Point
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message