lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Krupansky <jack.krupan...@gmail.com>
Subject Re: optimize requests that fetch 1000 rows
Date Fri, 12 Feb 2016 15:10:10 GMT
Thanks for that critical clarification. Try...

1. A different response writer to see if that impacts the clock time.
2. Selectively remove fields from the fl field list to see if some
particular field has some issue.
3. If you simply return only the ID for the document, how fast/slow is that?

How many fields are in fl?
Any function queries in fl?


-- Jack Krupansky

On Fri, Feb 12, 2016 at 4:57 AM, Matteo Grolla <matteo.grolla@gmail.com>
wrote:

> Hi Jack,
>      tell me if I'm wrong but qtime accounts for search time excluding the
> fetch of stored fields (I have a 90ms qtime and a ~30s time to obtain the
> results on the client on a LAN infrastructure for 300kB response). debug
> explains how much of qtime is used by each search component.
> For me 90ms are ok, I wouldn't spend time trying to make them 50ms, it's
> the ~30s to obtain the response that I'd like to tackle.
>
>
> 2016-02-12 5:42 GMT+01:00 Jack Krupansky <jack.krupansky@gmail.com>:
>
> > Again, first things first... debugQuery=true and see which Solr search
> > components are consuming the bulk of qtime.
> >
> > -- Jack Krupansky
> >
> > On Thu, Feb 11, 2016 at 11:33 AM, Matteo Grolla <matteo.grolla@gmail.com
> >
> > wrote:
> >
> > > virtual hardware, 200ms is taken on the client until response is
> written
> > to
> > > disk
> > > qtime on solr is ~90ms
> > > not great but acceptable
> > >
> > > Is it possible that the method FilenameUtils.splitOnTokens is really so
> > > heavy when requesting a lot of rows on slow hardware?
> > >
> > > 2016-02-11 17:17 GMT+01:00 Jack Krupansky <jack.krupansky@gmail.com>:
> > >
> > > > Good to know. Hmmm... 200ms for 10 rows is not outrageously bad, but
> > > still
> > > > relatively bad. Even 50ms for 10 rows would be considered barely
> okay.
> > > > But... again it depends on query complexity - simple queries should
> be
> > > well
> > > > under 50 ms for decent modern hardware.
> > > >
> > > > -- Jack Krupansky
> > > >
> > > > On Thu, Feb 11, 2016 at 10:36 AM, Matteo Grolla <
> > matteo.grolla@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi Jack,
> > > > >       response time scale with rows. Relationship doens't seem
> linear
> > > but
> > > > > Below 400 rows times are much faster,
> > > > > I view query times from solr logs and they are fast
> > > > > the same query with rows = 1000 takes 8s
> > > > > with rows = 10 takes 0.2s
> > > > >
> > > > >
> > > > > 2016-02-11 16:22 GMT+01:00 Jack Krupansky <
> jack.krupansky@gmail.com
> > >:
> > > > >
> > > > > > Are queries scaling linearly - does a query for 100 rows take
> > 1/10th
> > > > the
> > > > > > time (1 sec vs. 10 sec or 3 sec vs. 30 sec)?
> > > > > >
> > > > > > Does the app need/expect exactly 1,000 documents for the query
or
> > is
> > > > that
> > > > > > just what this particular query happened to return?
> > > > > >
> > > > > > What does they query look like? Is it complex or use wildcards
or
> > > > > function
> > > > > > queries, or is it very simple keywords? How many operators?
> > > > > >
> > > > > > Have you used the debugQuery=true parameter to see which search
> > > > > components
> > > > > > are taking the time?
> > > > > >
> > > > > > -- Jack Krupansky
> > > > > >
> > > > > > On Thu, Feb 11, 2016 at 9:42 AM, Matteo Grolla <
> > > > matteo.grolla@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Yonic,
> > > > > > >      after the first query I find 1000 docs in the document
> > cache.
> > > > > > > I'm using curl to send the request and requesting javabin
> format
> > to
> > > > > mimic
> > > > > > > the application.
> > > > > > > gc activity is low
> > > > > > > I managed to load the entire 50GB index in the filesystem
> cache,
> > > > after
> > > > > > that
> > > > > > > queries don't cause disk activity anymore.
> > > > > > > Time improves now queries that took ~30s take <10s.
But I hoped
> > > > better
> > > > > > > I'm going to use jvisualvm's sampler to analyze where time
is
> > spent
> > > > > > >
> > > > > > >
> > > > > > > 2016-02-11 15:25 GMT+01:00 Yonik Seeley <yseeley@gmail.com>:
> > > > > > >
> > > > > > > > On Thu, Feb 11, 2016 at 7:45 AM, Matteo Grolla <
> > > > > > matteo.grolla@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > Thanks Toke, yes, they are long times, and solr
qtime (to
> > > execute
> > > > > the
> > > > > > > > > query) is a fraction of a second.
> > > > > > > > > The response in javabin format is around 300k.
> > > > > > > >
> > > > > > > > OK, That tells us a lot.
> > > > > > > > And if you actually tested so that all the docs would
be in
> the
> > > > cache
> > > > > > > > (can you verify this by looking at the cache stats
after you
> > > > > > > > re-execute?) then it seems like the slowness is down
to any
> of:
> > > > > > > > a) serializing the response (it doesn't seem like
a 300K
> > response
> > > > > > > > should take *that* long to serialize)
> > > > > > > > b) reading/processing the response (how fast the client
can
> do
> > > > > > > > something with each doc is also a factor...)
> > > > > > > > c) other (GC, network, etc)
> > > > > > > >
> > > > > > > > You can try taking client processing out of the equation
by
> > > trying
> > > > a
> > > > > > > > curl request.
> > > > > > > >
> > > > > > > > -Yonik
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message