lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Cohen <joel.co...@bluefly.com>
Subject Re: Lowering query time
Date Tue, 11 Feb 2014 14:40:51 GMT
I'd like to thank you for lending a hand on my query time problem with
SolrCloud. By switching to a single shard with replicas setup, I've reduced
my query time to 18 msec. My full ingestion of 300k+ documents went down
from 2 hours 50 minutes to 1 hour 40 minutes. There are some code changes
that are going in that should help a bit as well. Big thanks to everyone
that had suggestions.


On Tue, Feb 4, 2014 at 8:11 PM, Alexandre Rafalovitch <arafalov@gmail.com>wrote:

> I suspect faceting is the issue here. The actual query you have shown
> seem to bring back a single document (or a single set of document for
> a product):
> fq=id:(320403401)
>
> On the other hand, you are asking for 4 field facets:
> facet.field=q_virtualCategory_ss
> facet.field=q_brand_s
> facet.field=q_color_s
> facet.field=q_category_ss
> AND 2 range facets, both clustered/grouped:
> facet.range=daysSinceStart_i
> facet.range=activePrice_l (e.g. f.activePrice_l.facet.range.gap=5000)
>
> And for all facets you have asked to bring back ALL of the results:
> facet.limit=-1
>
> Plus, you are doing a complex sort:
> sort=popularity_i desc,popularity_i desc
>
> So, you are probably spending quite a bit of time counting (especially
> in a shared setup) and then quite a bit more sending the response
> back.
>
> I would check the size of the result document (HTTP result) and see
> how large it is. Maybe you don't need all of the stuff that's coming
> back. I assume you are not actually querying Solr from the client's
> machine (that is I hope it is inside your data centre close to your
> web server), otherwise I would say to look at automatic content
> compression as well to minimize on-wire document size.
>
> Finally, if your documents have many stored fields (store=true in
> schema.xml) but you only return small subsets of them during search,
> you could look into using enableLazyFieldLoading flag in the
> solrconfig.
>
> Regards,
>    Alex.
> P.s. As others said, you don't seem to have too many documents.
> Perhaps you want replication instead of sharding for improved
> performance.
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
>
> On Wed, Feb 5, 2014 at 6:31 AM, Alexey Kozhemiakin
> <Alexey_Kozhemiakin@epam.com> wrote:
> > Btw "timing" for distributed requests are broken at this moment, it
> doesn't combine values from requests to shards.  I'm working on a patch.
> >
> > https://issues.apache.org/jira/browse/SOLR-3644
> >
> > -----Original Message-----
> > From: Jack Krupansky [mailto:jack@basetechnology.com]
> > Sent: Tuesday, February 04, 2014 22:00
> > To: solr-user@lucene.apache.org
> > Subject: Re: Lowering query time
> >
> > Add the debug=true parameter to some test queries and look at the
> "timing"
> > section to see which search components are taking the time.
> Traditionally, highlighting for large documents was a top culprit.
> >
> > Are you returning a lot of data or field values? Sometimes reducing the
> amount of data processed can help. Any multivalued fields with lots of
> values?
> >
> > -- Jack Krupansky
> >
> > -----Original Message-----
> > From: Joel Cohen
> > Sent: Tuesday, February 4, 2014 1:43 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Lowering query time
> >
> > 1. We are faceting. I'm not a developer so I'm not quite sure how we're
> doing it. How can I measure?
> > 2. I'm not sure how we'd force this kind of document partitioning. I can
> see how my shards are partitioned by looking at the clusterstate.json from
> Zookeeper, but I don't have a clue on how to get documents into specific
> shards.
> >
> > Would I be better off with fewer shards given the small size of my
> indexes?
> >
> >
> > On Tue, Feb 4, 2014 at 12:32 PM, Yonik Seeley <yonik@heliosearch.com>
> wrote:
> >
> >> On Tue, Feb 4, 2014 at 12:12 PM, Joel Cohen <joel.cohen@bluefly.com>
> >> wrote:
> >> > I'm trying to get the query time down to ~15 msec. Anyone have any
> >> > tuning recommendations?
> >>
> >> I guess it depends on what the slowest part of the query currently is.
> >>  If you are faceting, it's often that.
> >> Also, it's often a big win if you can somehow partition documents such
> >> that requests can normally be serviced from a single shard.
> >>
> >> -Yonik
> >> http://heliosearch.org - native off-heap filters and fieldcache for
> >> solr
> >>
> >
> >
> >
> > --
> >
> > joel cohen, senior system engineer
> >
> > e joel.cohen@bluefly.com p 212.944.8000 x276 bluefly, inc. 42 w. 39th
> st. new york, ny 10018 www.bluefly.com <
> http://www.bluefly.com/?referer=autosig> | *fly since
> > 2013...*
> >
>



-- 

joel cohen, senior system engineer

e joel.cohen@bluefly.com p 212.944.8000 x276
bluefly, inc. 42 w. 39th st. new york, ny 10018
www.bluefly.com <http://www.bluefly.com/?referer=autosig> | *fly since
2013...*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message