lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Advice to add additional non-related fields to a collection or create a subset of it?
Date Tue, 10 May 2016 04:10:54 GMT
Not quite sure where you are at with this. It sounds
like your slow loading is fixed and was a coding
issue on your part, that happens to us all.

bq: Is it advisable to has as less number of
queries to solr in a page?

Of course it is advisable to have as few Solr queries
executed to display a page as possible. Every one
costs you at least _some_ turnaround time. You can
mitigate this (assuming your Solr server isn't running
flat out) by issuing the subsequent queries in parallel
threads.

But it's not really a question to me of advisability, it's a
question of what your application needs to deliver. The
use-case drives all. You can do some tricks like display
partial pages and fill in the rest behind the scenes to
display when your user clicks something and the like.

bq: In my case, by denormalizing,that means putting the
product and supplier information into one collection?
The supplier information are stored but not indexed in the collection.

It Depends(tm). If all you want to do is provide supplier
information when people do product searches then stored-only
is fine.

If you want to perform queries like "show me all the products
supplied by supplier X", then you need to index at least
some values too.

Best,
Erick

On Sun, May 8, 2016 at 10:36 PM, Derek Poh <dpoh@globalsources.com> wrote:
> Hi Erick
>
> In my case, by denormalizing,that means putting the product and supplier
> information into one collection?
> The supplier information arestored but not indexed in thecollection.
>
> We haveidentified itwas a combination of a loop and bad source data that
> caused an endless loop under certain scenario.
>
> Is it advisable to has as less number of queries to solr in a page?
>
>
> On 5/6/2016 11:17 PM, Erick Erickson wrote:
>>
>> Denormalizing the data is usually the first thing to try. That's
>> certainly the preferred option if it doesn't bloat the index
>> unacceptably.
>>
>> But my real question is what have you done to try to figure out _why_
>> it's slow? Do you have some loop
>> like
>> for (each found document)
>>     extract all the supplier IDs and query Solr for them)
>>
>> ? That's a fundamental design decision that will be expensive.
>>
>> Have you examined the time each query takes to see if Solr is really
>> the bottleneck or whether it's "something else"? Mind you, I have no
>> clue what "something else" is here....
>>
>> Do you ever return lots of rows (i.e. thousands)?
>>
>> Solr serves queries very quickly, so I'd concentrate on identifying what
>> is slow before jumping to a solution....
>>
>> Best,
>> Erick
>>
>> On Wed, May 4, 2016 at 10:28 PM, Derek Poh <dpoh@globalsources.com> wrote:
>>>
>>> Hi
>>>
>>> We have a "product" collection and a "supplier" collection.
>>> The "product" collection contains products information and "supplier"
>>> collection contains the product's suppliers information.
>>> We have a subsidiary page that query on "product" collection for the
>>> search.
>>> The display result include product and supplier information.
>>> This page will query the "product" collection to get the matching product
>>> records.
>>>  From this query a list of the matching product's supplier id is
>>> extracted
>>> and used in a filter query against the "supplier" collection to get the
>>> necessary supplier's information.
>>>
>>> The loading of this page is very slow, it leads to timeout at times as
>>> well.
>>> Beside looking at tweaking the codes of the page we are also looking at
>>> what
>>> tweaking can be done on solr side. Reducing the number of queries
>>> generated
>>> bythis page was one of the optionto try.
>>>
>>> The main "product" collection is also use by our site main search page
>>> and
>>> other subsidiary pages as well. So the query load on it is substantial.
>>> It has about 6.5 million documents and index size of 38-39 GB.
>>> It is setup as 1 shard with 5 replicas. Each replica is on it's own
>>> server.
>>> Total of 5 servers.
>>> There are other smaller collections with similar 1 shard 5 replicas setup
>>> residing on these servers as well.
>>>
>>> I am thinking of either
>>> 1. Index supplier information into the "product" collection.
>>> 2. Create another similar "product" collection for this page to use. This
>>> collection will have lesser product fields and will include the required
>>> supplier fields. But the number of documents in it will be the same as
>>> the
>>> main "product" collection. The index size will be smallerthough.
>>>
>>> With either 2 options we do not need to query "supplier" collection. So
>>> there is one less query and hopefully it will improve the performance of
>>> this page.
>>>
>>> What is the advise between the 2 options?
>>> Any other advice or options?
>>>
>>> Derek
>>>
>>> ----------------------
>>> CONFIDENTIALITY NOTICE
>>> This e-mail (including any attachments) may contain confidential and/or
>>> privileged information. If you are not the intended recipient or have
>>> received this e-mail in error, please inform the sender immediately and
>>> delete this e-mail (including any attachments) from your computer, and
>>> you
>>> must not use, disclose to anyone else or copy this e-mail (including any
>>> attachments), whether in whole or in part.
>>> This e-mail and any reply to it may be monitored for security, legal,
>>> regulatory compliance and/or other appropriate reasons.
>>
>>
>
>
> ----------------------
> CONFIDENTIALITY NOTICE
> This e-mail (including any attachments) may contain confidential and/or
> privileged information. If you are not the intended recipient or have
> received this e-mail in error, please inform the sender immediately and
> delete this e-mail (including any attachments) from your computer, and you
> must not use, disclose to anyone else or copy this e-mail (including any
> attachments), whether in whole or in part.
> This e-mail and any reply to it may be monitored for security, legal,
> regulatory compliance and/or other appropriate reasons.

Mime
View raw message