From solr-user-return-148507-archive-asf-public=cust-asf.ponee.io@lucene.apache.org Thu Jun 20 20:06:12 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 2F2A4180670 for ; Thu, 20 Jun 2019 22:06:12 +0200 (CEST) Received: (qmail 53053 invoked by uid 500); 20 Jun 2019 20:06:04 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 53042 invoked by uid 99); 20 Jun 2019 20:06:04 -0000 Received: from Unknown (HELO mailrelay1-lw-us.apache.org) (10.10.3.159) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Jun 2019 20:06:04 +0000 Received: from mail-wm1-f42.google.com (mail-wm1-f42.google.com [209.85.128.42]) by mailrelay1-lw-us.apache.org (ASF Mail Server at mailrelay1-lw-us.apache.org) with ESMTPSA id 1B5831C84 for ; Thu, 20 Jun 2019 20:06:01 +0000 (UTC) Received: by mail-wm1-f42.google.com with SMTP id x15so4236194wmj.3 for ; Thu, 20 Jun 2019 13:06:01 -0700 (PDT) X-Gm-Message-State: APjAAAWM6xVfgkiBjmfCmhiDfZkgdSvghWK7/Ib0Ieip1BfVQZdkXsYE th+kQF09C4fxuTFZ0OQtKNHLbmKh+tUW5/Kavt4= X-Google-Smtp-Source: APXvYqyyMuU0EN/1yGduPTyWEEUmU0qAIJSNYaZEPXdai/g35dg8QhKvdKTpjHjh4Y1RJJ8p5b75NFGi2jgiB6IrpX4= X-Received: by 2002:a05:600c:2549:: with SMTP id e9mr860931wma.46.1561061160165; Thu, 20 Jun 2019 13:06:00 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Mikhail Khludnev Date: Thu, 20 Jun 2019 23:05:49 +0300 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Large Data set relationships handling To: solr-user Content-Type: multipart/alternative; boundary="000000000000ecaf2f058bc6df37" --000000000000ecaf2f058bc6df37 Content-Type: text/plain; charset="UTF-8" On Thu, Jun 20, 2019 at 5:47 PM Lucky Sharma wrote: > Hi all, > Needed help in one use case : > It is like when you have 2 sets of data suppose A and B, which are > linked to each other. For example, each entity of set X can have 1 to > many relationships to the set B, and as a result, I need the > sorted/faceted values of the values from Set B. > For example entity x(i) from Set A, can have a relation which all the > values in the Set B. and another entity x(j) from Set A can have > [y(i)... y(j)] values from set B. > > > * both the data sets are too larger. > > One Idea was too just have data of Set B, and we just put fq for all > the values of which Set X can have and then we can do sort and > faceting on them. > but since the data size is +1000 it will never be a good approach. > 1. this is what "lucene join" does underneath. It's enabled by score=none see https://lucene.apache.org/solr/guide/7_2/other-parsers.html#OtherParsers-JoinQueryParser 2. this requires proper sharding, linked data should reside the same shard, otherwise - no way. 3. note, when you say fq with all values, hopefully it might be achieved with {!terms} qp, which way more powerful than bare {!lucene}'s bq. 4. the set notation above confuses me a little, it might seem many-to-many indeed. > > Another Idea is we can create a parent-child data relationship as 2 > different collections and then perform join over them, > Query-time join can't handle two sharded collection, although there some plugins and patches claiming so. Index time join aka Block join or {!parent} requires docs to be collocated. > > Please review and suggest if there could be any other way possible of > solving this problem. > > > > -- > Warm Regards, > > Lucky Sharma > Contact No: +91 9821559918 > -- Sincerely yours Mikhail Khludnev --000000000000ecaf2f058bc6df37--