lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Advice to add additional non-related fields to a collection or create a subset of it?
Date Fri, 06 May 2016 15:17:08 GMT
Denormalizing the data is usually the first thing to try. That's
certainly the preferred option if it doesn't bloat the index
unacceptably.

But my real question is what have you done to try to figure out _why_
it's slow? Do you have some loop
like
for (each found document)
   extract all the supplier IDs and query Solr for them)

? That's a fundamental design decision that will be expensive.

Have you examined the time each query takes to see if Solr is really
the bottleneck or whether it's "something else"? Mind you, I have no
clue what "something else" is here....

Do you ever return lots of rows (i.e. thousands)?

Solr serves queries very quickly, so I'd concentrate on identifying what
is slow before jumping to a solution....

Best,
Erick

On Wed, May 4, 2016 at 10:28 PM, Derek Poh <dpoh@globalsources.com> wrote:
> Hi
>
> We have a "product" collection and a "supplier" collection.
> The "product" collection contains products information and "supplier"
> collection contains the product's suppliers information.
> We have a subsidiary page that query on "product" collection for the search.
> The display result include product and supplier information.
> This page will query the "product" collection to get the matching product
> records.
> From this query a list of the matching product's supplier id is extracted
> and used in a filter query against the "supplier" collection to get the
> necessary supplier's information.
>
> The loading of this page is very slow, it leads to timeout at times as well.
> Beside looking at tweaking the codes of the page we are also looking at what
> tweaking can be done on solr side. Reducing the number of queries generated
> bythis page was one of the optionto try.
>
> The main "product" collection is also use by our site main search page and
> other subsidiary pages as well. So the query load on it is substantial.
> It has about 6.5 million documents and index size of 38-39 GB.
> It is setup as 1 shard with 5 replicas. Each replica is on it's own server.
> Total of 5 servers.
> There are other smaller collections with similar 1 shard 5 replicas setup
> residing on these servers as well.
>
> I am thinking of either
> 1. Index supplier information into the "product" collection.
> 2. Create another similar "product" collection for this page to use. This
> collection will have lesser product fields and will include the required
> supplier fields. But the number of documents in it will be the same as the
> main "product" collection. The index size will be smallerthough.
>
> With either 2 options we do not need to query "supplier" collection. So
> there is one less query and hopefully it will improve the performance of
> this page.
>
> What is the advise between the 2 options?
> Any other advice or options?
>
> Derek
>
> ----------------------
> CONFIDENTIALITY NOTICE
> This e-mail (including any attachments) may contain confidential and/or
> privileged information. If you are not the intended recipient or have
> received this e-mail in error, please inform the sender immediately and
> delete this e-mail (including any attachments) from your computer, and you
> must not use, disclose to anyone else or copy this e-mail (including any
> attachments), whether in whole or in part.
> This e-mail and any reply to it may be monitored for security, legal,
> regulatory compliance and/or other appropriate reasons.

Mime
View raw message