Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 10460200AC0 for ; Tue, 10 May 2016 06:11:20 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 0ECBC160A0F; Tue, 10 May 2016 04:11:20 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 579311609A8 for ; Tue, 10 May 2016 06:11:19 +0200 (CEST) Received: (qmail 87047 invoked by uid 500); 10 May 2016 04:11:17 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 87035 invoked by uid 99); 10 May 2016 04:11:17 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 May 2016 04:11:17 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id F18A91A03E1 for ; Tue, 10 May 2016 04:11:16 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.802 X-Spam-Level: X-Spam-Status: No, score=-0.802 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 20vZJQKdU4Oq for ; Tue, 10 May 2016 04:11:14 +0000 (UTC) Received: from mail-io0-f175.google.com (mail-io0-f175.google.com [209.85.223.175]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id 702685F1E5 for ; Tue, 10 May 2016 04:11:14 +0000 (UTC) Received: by mail-io0-f175.google.com with SMTP id f89so3352802ioi.0 for ; Mon, 09 May 2016 21:11:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=PGoCPWjC9EhCFcUeNZ9m7J4Ta4U8RfZYz7Mu8y9SN04=; b=kwxpiXvXKT9AZvKictLa4m/hgjSboZePRZaWsIzTGhSZoz2EEZj7ICK7EOTwPvrHWk 6b9EZ32o5hlthbKwED1a+TgplxJ+b9nCBcZ9mT9c6YqvGsci7Dp39c7uiSGcR1Ci2uzp rkUdgmyDNxz0//7pKxhxvPUgdy4snMsoxGyWRQFDRwcj8yP66T56MgNK5VdW3LYI5Qm/ +s6sj9oobKVzMJtKPKCxvhAitwzlS4G1GhPXISBsIxGzWREX1i1dkPOtGZJ7CUMlGEpe +/TLqRcA14FNh/nZ7R3ZrZt5IPoPSBA5YOfz+ItBB9eBdg6oS4F2l1fG7G85CCQKUV6x RuMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=PGoCPWjC9EhCFcUeNZ9m7J4Ta4U8RfZYz7Mu8y9SN04=; b=nD7FgAWoEzlUbkTzktVG8YPEDxfr5MmngCYQ/MKgQ1/AJWfVPeWAb3Uu7+te5vZArn y20mBWyffEv0merlbrudBYgbYylirfMpfzBHo9vH1eMlKWQZSHg6wrHbMK0kR5B+OpCB iJyhQKEFa3Y/ETBdjB384UhMBicjqY5+ZmCuBftWEGR2EA/Vak7HUNhvHzo94eGspEI2 7KuIptgtPff1zzJjXvJ4SfxzLrW9UuzUZuFc2cICz1BXUj32wkhT0x7Oyf2isO3CKxmq OhETi8xTGqtNm4w6UPOBI/BteqsWvJ7R7vVSNGk9rozD25v+BQOIe/R9xrHUvC9ZChGW 5P1g== X-Gm-Message-State: AOPr4FUc//CEtI8kOCov4EBowEbEKVKRSVtiSWr/Lo14gRQbV4CVWVDO/bPvanHQx3ISUTHuZTTUadljn8BQ1A== X-Received: by 10.107.9.74 with SMTP id j71mr42607493ioi.50.1462853473796; Mon, 09 May 2016 21:11:13 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.12.18 with HTTP; Mon, 9 May 2016 21:10:54 -0700 (PDT) In-Reply-To: <573021C8.8080508@globalsources.com> References: <572AD9FA.6040800@globalsources.com> <573021C8.8080508@globalsources.com> From: Erick Erickson Date: Mon, 9 May 2016 21:10:54 -0700 Message-ID: Subject: Re: Advice to add additional non-related fields to a collection or create a subset of it? To: solr-user Content-Type: text/plain; charset=UTF-8 archived-at: Tue, 10 May 2016 04:11:20 -0000 Not quite sure where you are at with this. It sounds like your slow loading is fixed and was a coding issue on your part, that happens to us all. bq: Is it advisable to has as less number of queries to solr in a page? Of course it is advisable to have as few Solr queries executed to display a page as possible. Every one costs you at least _some_ turnaround time. You can mitigate this (assuming your Solr server isn't running flat out) by issuing the subsequent queries in parallel threads. But it's not really a question to me of advisability, it's a question of what your application needs to deliver. The use-case drives all. You can do some tricks like display partial pages and fill in the rest behind the scenes to display when your user clicks something and the like. bq: In my case, by denormalizing,that means putting the product and supplier information into one collection? The supplier information are stored but not indexed in the collection. It Depends(tm). If all you want to do is provide supplier information when people do product searches then stored-only is fine. If you want to perform queries like "show me all the products supplied by supplier X", then you need to index at least some values too. Best, Erick On Sun, May 8, 2016 at 10:36 PM, Derek Poh wrote: > Hi Erick > > In my case, by denormalizing,that means putting the product and supplier > information into one collection? > The supplier information arestored but not indexed in thecollection. > > We haveidentified itwas a combination of a loop and bad source data that > caused an endless loop under certain scenario. > > Is it advisable to has as less number of queries to solr in a page? > > > On 5/6/2016 11:17 PM, Erick Erickson wrote: >> >> Denormalizing the data is usually the first thing to try. That's >> certainly the preferred option if it doesn't bloat the index >> unacceptably. >> >> But my real question is what have you done to try to figure out _why_ >> it's slow? Do you have some loop >> like >> for (each found document) >> extract all the supplier IDs and query Solr for them) >> >> ? That's a fundamental design decision that will be expensive. >> >> Have you examined the time each query takes to see if Solr is really >> the bottleneck or whether it's "something else"? Mind you, I have no >> clue what "something else" is here.... >> >> Do you ever return lots of rows (i.e. thousands)? >> >> Solr serves queries very quickly, so I'd concentrate on identifying what >> is slow before jumping to a solution.... >> >> Best, >> Erick >> >> On Wed, May 4, 2016 at 10:28 PM, Derek Poh wrote: >>> >>> Hi >>> >>> We have a "product" collection and a "supplier" collection. >>> The "product" collection contains products information and "supplier" >>> collection contains the product's suppliers information. >>> We have a subsidiary page that query on "product" collection for the >>> search. >>> The display result include product and supplier information. >>> This page will query the "product" collection to get the matching product >>> records. >>> From this query a list of the matching product's supplier id is >>> extracted >>> and used in a filter query against the "supplier" collection to get the >>> necessary supplier's information. >>> >>> The loading of this page is very slow, it leads to timeout at times as >>> well. >>> Beside looking at tweaking the codes of the page we are also looking at >>> what >>> tweaking can be done on solr side. Reducing the number of queries >>> generated >>> bythis page was one of the optionto try. >>> >>> The main "product" collection is also use by our site main search page >>> and >>> other subsidiary pages as well. So the query load on it is substantial. >>> It has about 6.5 million documents and index size of 38-39 GB. >>> It is setup as 1 shard with 5 replicas. Each replica is on it's own >>> server. >>> Total of 5 servers. >>> There are other smaller collections with similar 1 shard 5 replicas setup >>> residing on these servers as well. >>> >>> I am thinking of either >>> 1. Index supplier information into the "product" collection. >>> 2. Create another similar "product" collection for this page to use. This >>> collection will have lesser product fields and will include the required >>> supplier fields. But the number of documents in it will be the same as >>> the >>> main "product" collection. The index size will be smallerthough. >>> >>> With either 2 options we do not need to query "supplier" collection. So >>> there is one less query and hopefully it will improve the performance of >>> this page. >>> >>> What is the advise between the 2 options? >>> Any other advice or options? >>> >>> Derek >>> >>> ---------------------- >>> CONFIDENTIALITY NOTICE >>> This e-mail (including any attachments) may contain confidential and/or >>> privileged information. If you are not the intended recipient or have >>> received this e-mail in error, please inform the sender immediately and >>> delete this e-mail (including any attachments) from your computer, and >>> you >>> must not use, disclose to anyone else or copy this e-mail (including any >>> attachments), whether in whole or in part. >>> This e-mail and any reply to it may be monitored for security, legal, >>> regulatory compliance and/or other appropriate reasons. >> >> > > > ---------------------- > CONFIDENTIALITY NOTICE > This e-mail (including any attachments) may contain confidential and/or > privileged information. If you are not the intended recipient or have > received this e-mail in error, please inform the sender immediately and > delete this e-mail (including any attachments) from your computer, and you > must not use, disclose to anyone else or copy this e-mail (including any > attachments), whether in whole or in part. > This e-mail and any reply to it may be monitored for security, legal, > regulatory compliance and/or other appropriate reasons.