Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 253CD200C54 for ; Wed, 12 Apr 2017 12:39:20 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 23C2F160B95; Wed, 12 Apr 2017 10:39:20 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 1C108160B8A for ; Wed, 12 Apr 2017 12:39:18 +0200 (CEST) Received: (qmail 29973 invoked by uid 500); 12 Apr 2017 10:39:16 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 29961 invoked by uid 99); 12 Apr 2017 10:39:15 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Apr 2017 10:39:15 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id CC932181059 for ; Wed, 12 Apr 2017 10:39:14 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.603 X-Spam-Level: X-Spam-Status: No, score=0.603 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_REPLY=1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-2.796, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id mXSFSMh_brnC for ; Wed, 12 Apr 2017 10:39:10 +0000 (UTC) Received: from mail-wr0-f170.google.com (mail-wr0-f170.google.com [209.85.128.170]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 9BC825FD7D for ; Wed, 12 Apr 2017 10:39:09 +0000 (UTC) Received: by mail-wr0-f170.google.com with SMTP id l28so14644009wre.0 for ; Wed, 12 Apr 2017 03:39:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=+hxIf44L0A4o7TbjDG/GpiqcI5N7WLbQaW8z3oy/hVU=; b=rnufe6ZHaWI+x6zBLeI/CMQLfHEbiRXM7iuZ3/o43ALHs4fnNezto1aJ4Q3EnjLfB1 AcRBx4IUAyGFzmv6bU0jJYFzVM9R+133f4d0Z/dAGBnZd/nfK2C81Q+tNCF4qe5EvRhp dzXrLYLgHQdUWBtCNKIWKRUpdun9Pi4ul+q6GwN7yBbFEuOyK+EWvo68cHPnQu3DawgY lzvyLZ3nB4aDbZQdmuPj5SS6i2K71G6fp2s+v5jFJkAeQ+Ylip/3ZRsIlgnGKZ97M4uv alzMGZ/TbejOPTCUFSPMHwXzGoylSyvqqQzvrklUNkAc13swyfLgxfhLHFB8uOTlL+PL g26A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=+hxIf44L0A4o7TbjDG/GpiqcI5N7WLbQaW8z3oy/hVU=; b=K83TjXHPwvRg6wG8I7BwOoujK+NSaZ00f9ST5fL14qSQi6PYKFO+v2ixvLomuSm7ts aVkQ6ipopEpFcldPuQzDMkb7tBboSYHaDh8w14U4osXFm8fKT3QvVkYFEwfENiupXoTj 3K5TvF1t37Ffne8rG1yApMSVpb9M7KPGI9DXTzxmGpLIn8bnougnkfcuw2hPLqfSf2Qw RCkCMqMLKMFqW54AXFojtiQ9ppMzeK8VWIs2Wpaf4KQsNHgDlvgdl9ou5b+4urTtzgWu Eu2EuMjQ29wCqvsxXz+D+WqWBqc8EiUt7fQT6q3ldOTC7jNbLo3UWLExKuII7ZTtQCqF nDUQ== X-Gm-Message-State: AN3rC/5gH2bw8WBR5tVjIBlD0Iq+QX6N6RMdALNSMPqCZJMCyxwdCR7pmc/enJGb9NggsiOLGDZGVUsQj/JHfA== X-Received: by 10.223.151.200 with SMTP id t8mr2345314wrb.148.1491993542601; Wed, 12 Apr 2017 03:39:02 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.236.24 with HTTP; Wed, 12 Apr 2017 03:39:01 -0700 (PDT) In-Reply-To: References: <0EBD2A1D-2057-4702-BBED-A862FAC0D1AF@mapr.com> <0977CE5C-28DE-4986-8ED4-23DD1BEC4B03@mapr.com> From: Muhammad Gelbana Date: Wed, 12 Apr 2017 12:39:01 +0200 Message-ID: Subject: Re: Is it possible to delegate data joins and filtering to the datasource ? To: dev@drill.apache.org Content-Type: multipart/alternative; boundary=94eb2c1b54ce1d332a054cf5d1e1 archived-at: Wed, 12 Apr 2017 10:39:20 -0000 --94eb2c1b54ce1d332a054cf5d1e1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I have done it. Thanks a lot Weijie and all of you for your time. *---------------------* *Muhammad Gelbana* http://www.linkedin.com/in/mgelbana On Thu, Apr 6, 2017 at 3:15 PM, weijie tong wrote= : > some tips: > 1. you need to know the RexInputRef index relationship between the > JoinRel's and its inputs's . > > join ( 1,2 ,3,4,5) > > left input(1,2,3) right input (1,2) > > 1,2,3, =3D=3D=3D> left input (1 ,2,3) > > 4,5 =3D=3D=3D=3D>right input (1,2) > > 2. you capture the index map relationship when you iterate over your > JoinRelNode of your defined Rule( CartesianProductJoinRule) , and store > these index mapping data in your defined BGroupScan( name convention of m= y > last example ) > this mapping struct may be: destination index ------------->( source > ScanRel : source Index) . > to 1 example data ,the struct will be: > 1 =3D=3D>(left scan1 : 1) > 2 =3D=3D>(left scan1 : 2) > 3 =3D=3D>(left scan1 : 3) > 4 =3D=3D>(right scan2 : 1) > 5 =3D=3D>(right scan2 : 2) > > 3. you define another Rule (match Project RelNode)which depends on the > index mapping data of your last step . At this rule you pick the final > output project's index and pick its mapped index by the mapping struct, > then you find the final output column name and related tables. > > > > > On Tue, Apr 4, 2017 at 1:51 AM, Muhammad Gelbana > wrote: > > > I've succeeded, theoretically, in what I wanted to do because I had to > send > > the selected columns manually to my datasource. Would someone please te= ll > > me how can I identify the selected columns in the join ? I searched a l= ot > > without success. > > > > *---------------------* > > *Muhammad Gelbana* > > http://www.linkedin.com/in/mgelbana > > > > On Sat, Apr 1, 2017 at 1:43 AM, Muhammad Gelbana > > wrote: > > > > > So I intend to use this constructor for the new *RelNode*: > > *org.apache.drill.exec.planner.logical.DrillScanRel. > > DrillScanRel(RelOptCluster, > > > RelTraitSet, RelOptTable, GroupScan, RelDataType, List)* > > > > > > How can I provide it's parameters ? > > > > > > 1. *RelOptCluster*: Can I pass *DrillJoinRel.getCluster()* ? > > > > > > 2. *RelTraitSet*: Can I pass *DrillJoinRel.getTraitSet()* ? > > > > > > 3. *RelOptTable*: I assume I can use this factory method > > (*org.apache.calcite.prepare.RelOptTableImpl.create(RelOptSchema, > > > RelDataType, Table, Path)*). Any hints of how I can provide these > > > parameters too ? Should I just go ahead and manually create a new > > instance > > > of each parameter ? > > > > > > 4. *GroupScan*: I understand I have to create a new implementation > > > class for this one so now questions here so far. > > > > > > 5. *RelDataType*: This one is confusing. Because I understand that > for > > > *DrillJoinRel.transformTo(newRel)* to work, I have to provide a > > > *newRel* instance that has a *RelDataType* instance with the same > > > amount of fields and compatible types (i.e. this is mandated by > > *org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelNode, > > > RelNode, Object)*). Why couldn't I provide a *RelDataType* with > > > a different set of fields ? How can I resolve this ? > > > > > > 6. *List*: I assume I can call this method and pass my > > > columns names to it, one by one. (i.e. > > > *org.apache.drill.common.expression.SchemaPath. > > getCompoundPath(String...)* > > > ) > > > > > > Thanks. > > > > > > *---------------------* > > > *Muhammad Gelbana* > > > http://www.linkedin.com/in/mgelbana > > > > > > On Fri, Mar 31, 2017 at 1:59 PM, weijie tong > > > wrote: > > > > > >> your code seems right , just to implement the 'call.transformTo()' > ,but > > >> the > > >> left detail , maybe I think I can't express the left things so > > precisely, > > >> just as @Paul Rogers mentioned the plugin detail is a little trivial= . > > >> > > >> 1. drillScanRel.getGroupScan . > > >> 2. you need to extend the AbstractGroupScan ,and let it holds some > > >> information about your storage . This defined GroupScan just call it > > >> AGroupScan corresponds to a joint scan RelNode. Then you can define > > >> another > > >> GroupScan called BGroupScan which extends AGroupScan, The BGroupScan > > acts > > >> as a aggregate container which holds the two joint AGroupScan. > > >> 3 . The new DrillScanRel has the same RowType as the JoinRel. The > > >> requirement and exmple of transforming between two different RelNode= s > > can > > >> be found from other codes. This DrillScanRel's GroupScan is the > > >> BGroupScan. > > >> This new DrillScanRel is the one applys to the code > > >> `call.transformTo(xxxx)`. > > >> > > >> maybe the picture below may help you understand my idea: > > >> > > >> > > >> ---Scan (AGroupScan) > > >> suppose the initial RelNode tree is : Project ----Join --| > > >> > > >> | ---Scan (AGroupScan) > > >> > > >> | > > >> > > >> \|/ > > >> after applied this rule ,the final tree is: Project-----Scan ( > > BGroupScan > > >> ( > > >> List(AGroupScan ,AGroupScan) ) ) > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> On Thu, Mar 30, 2017 at 10:01 PM, Muhammad Gelbana < > m.gelbana@gmail.com > > > > > >> wrote: > > >> > > >> > *This is my rule class* > > >> > > > >> > public class CartesianProductJoinRule extends RelOptRule { > > >> > > > >> > public static final CartesianProductJoinRule INSTANCE =3D new > > >> > CartesianProductJoinRule(DrillJoinRel.class); > > >> > > > >> > public CartesianProductJoinRule(Class clazz) { > > >> > super(operand(clazz, operand(RelNode.class, any()), > > >> > operand(RelNode.class, any())), > > >> > "CartesianProductJoin"); > > >> > } > > >> > > > >> > @Override > > >> > public boolean matches(RelOptRuleCall call) { > > >> > DrillJoinRel drillJoin =3D call.rel(0); > > >> > return drillJoin.getJoinType() =3D=3D JoinRelType.INNER && > > >> > drillJoin.getCondition().isAlwaysTrue(); > > >> > } > > >> > > > >> > @Override > > >> > public void onMatch(RelOptRuleCall call) { > > >> > DrillJoinRel join =3D call.rel(0); > > >> > RelNode firstRel =3D call.rel(1); > > >> > RelNode secondRel =3D call.rel(2); > > >> > HepRelVertex right =3D (HepRelVertex) join.getRight(); > > >> > HepRelVertex left =3D (HepRelVertex) join.getLeft(); > > >> > > > >> > List firstFields =3D firstRel.getRowType= (). > > >> > getFieldList(); > > >> > List secondFields =3D > secondRel.getRowType(). > > >> > getFieldList(); > > >> > > > >> > RelNode firstTable =3D ((HepRelVertex)firstRel. > > >> > getInput(0)).getCurrentRel(); > > >> > RelNode secondTable =3D ((HepRelVertex)secondRel. > > >> > getInput(0)).getCurrentRel(); > > >> > > > >> > //call.transformTo(???); > > >> > } > > >> > } > > >> > > > >> > *To register the rule*, I overrode the *getOptimizerRules* method = in > > my > > >> > storage plugin class > > >> > > > >> > public Set getOptimizerRules(OptimizerRul > > >> esContext > > >> > optimizerContext, PlannerPhase phase) { > > >> > switch (phase) { > > >> > case LOGICAL_PRUNE_AND_JOIN: > > >> > case LOGICAL_PRUNE: > > >> > case LOGICAL: > > >> > return getLogicalOptimizerRules(optimizerContext); > > >> > case PHYSICAL: > > >> > return getPhysicalOptimizerRules(optimizerContext); > > >> > case PARTITION_PRUNING: > > >> > case JOIN_PLANNING: > > >> > * return ImmutableSet.of(CartesianProductJoinRule. > INSTANCE);* > > >> > default: > > >> > return ImmutableSet.of(); > > >> > } > > >> > > > >> > } > > >> > > > >> > The rule is firing as expected but I'm lost when it comes to the > > >> > conversion. Earlier, you said "the new equivalent ScanRel is to ha= ve > > the > > >> > joined > > >> > ScanRel nodes's GroupScans", so > > >> > > > >> > 1. How can I obtain the left and right tables group scans ? > > >> > 2. What exactly do you mean by joining them ? Is there a utilit= y > > >> method > > >> > to do so ? Or should I manually create a new single group scan > and > > >> add > > >> > the > > >> > information I need there ? Looking into other *GroupScan* > > >> > implementations, I found that they have references to some > runtime > > >> > objects > > >> > such as the storage plugin and the storage plugin configuration= . > At > > >> this > > >> > stage, I don't know how to obtain those ! > > >> > 3. Precisely, what kind of object should I use to represent a > > >> *RelNode* > > >> > that represents the whole join ? I understand that I need to us= e > an > > >> > object > > >> > that has implements the *RelNode* interface. Then I should add > the > > >> > created *GroupScan* to that *RelNode* instance and call > > >> > *call.transformTo(newRelNode)*, correct ? > > >> > > > >> > > > >> > *---------------------* > > >> > *Muhammad Gelbana* > > >> > http://www.linkedin.com/in/mgelbana > > >> > > > >> > On Thu, Mar 30, 2017 at 2:46 AM, weijie tong < > tongweijie178@gmail.com > > > > > >> > wrote: > > >> > > > >> > > I mean the rule you write could be placed in the > > >> > PlannerPhase.JOIN_PlANNING > > >> > > which uses the HepPlanner. This phase is to solve the logical > > relnode > > >> . > > >> > > Hope to help you. > > >> > > Muhammad Gelbana =E4=BA=8E2017=E5=B9=B43=E6= =9C=8830=E6=97=A5 =E5=91=A8=E5=9B=9B=E4=B8=8A=E5=8D=8812:07=E5=86=99=E9=81= =93=EF=BC=9A > > >> > > > > >> > > > =E2=80=8BThanks a lot Weijie, I believe I'm very close now. I = hope you > > don't > > >> > mind > > >> > > > few more questions please: > > >> > > > > > >> > > > > > >> > > > 1. The new rule you are mentioning is a physical rule ? So = I > > >> should > > >> > > > implement the Prel interface ? > > >> > > > 2. By "traversing the join to find the ScanRel" > > >> > > > - This sounds like I have to "search" for something. > > >> Shouldn't I > > >> > > just > > >> > > > work on transforming the left (i.e. DrillJoinRel's > getLeft() > > >> > > method) > > >> > > > and > > >> > > > right (i.e. DrillJoinRel's getLeft() method) join object= s > ? > > >> > > > - The "left" and "right" elements of the DrillJoinRel > object > > >> are > > >> > of > > >> > > > type RelSubset, not *ScanRel* and I can't find a type > called > > >> > > > *ScanRel*. > > >> > > > I suppose you meant *ScanPrel*, specially because it > > >> implements > > >> > the > > >> > > > *Prel* interface that provides the *getPhysicalOperator* > > >> method. > > >> > > > 3. What if multiple physical or logical rules match for a > > single > > >> > node, > > >> > > > what decides which rule will be applied and which will be > > >> rejected ? > > >> > > Is > > >> > > > it > > >> > > > the *AbstractRelNode.computeSelfCost(RelOptPlanner)* method > ? > > >> What > > >> > if > > >> > > > more than one rule produces the same cost ? > > >> > > > > > >> > > > I'll go ahead and see what I can do for now before hopefully y= ou > > may > > >> > > offer > > >> > > > more guidance. THANKS A LOT. > > >> > > > > > >> > > > *---------------------* > > >> > > > *Muhammad Gelbana* > > >> > > > http://www.linkedin.com/in/mgelbana > > >> > > > > > >> > > > On Wed, Mar 29, 2017 at 4:23 AM, weijie tong < > > >> tongweijie178@gmail.com> > > >> > > > wrote: > > >> > > > > > >> > > > > to avoid misunderstanding , the new equivalent ScanRel is to > > have > > >> the > > >> > > > > joined ScanRel nodes's GroupScans, as the GroupScans > indirectly > > >> hold > > >> > > the > > >> > > > > underlying storage information. > > >> > > > > > > >> > > > > On Wed, Mar 29, 2017 at 10:15 AM, weijie tong < > > >> > tongweijie178@gmail.com > > >> > > > > > >> > > > > wrote: > > >> > > > > > > >> > > > > > > > >> > > > > > my suggestion is you define a rule which matches the > > >> DrillJoinRel > > >> > > > RelNode > > >> > > > > > , then at the onMatch method ,you traverse the join childr= en > > to > > >> > find > > >> > > > the > > >> > > > > > ScanRel nodes . You define a new ScanRel which include the > > >> ScanRel > > >> > > > nodes > > >> > > > > > you find last step. Then transform the JoinRel to this > > >> equivalent > > >> > new > > >> > > > > > ScanRel. > > >> > > > > > Finally , the plan tree will not have the JoinRel but the > > >> ScanRel. > > >> > > > You > > >> > > > > > can let your join plan rule in the > > PlannerPhase.JOIN_PLANNING. > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > > > > > > > --94eb2c1b54ce1d332a054cf5d1e1--