Return-Path: X-Original-To: apmail-drill-dev-archive@www.apache.org Delivered-To: apmail-drill-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E58AE17D6E for ; Mon, 4 May 2015 07:25:19 +0000 (UTC) Received: (qmail 12204 invoked by uid 500); 4 May 2015 07:25:19 -0000 Delivered-To: apmail-drill-dev-archive@drill.apache.org Received: (qmail 12144 invoked by uid 500); 4 May 2015 07:25:19 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 12132 invoked by uid 99); 4 May 2015 07:25:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 May 2015 07:25:19 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: message received from 54.164.171.186 which is an MX secondary for dev@drill.apache.org) Received: from [54.164.171.186] (HELO mx1-us-east.apache.org) (54.164.171.186) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 May 2015 07:25:14 +0000 Received: from mail-ie0-f178.google.com (mail-ie0-f178.google.com [209.85.223.178]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 9B61A42AE3 for ; Mon, 4 May 2015 07:24:53 +0000 (UTC) Received: by iedfl3 with SMTP id fl3so140450929ied.1 for ; Mon, 04 May 2015 00:24:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=K4wTfHNWSQTUJnnJfwG0LXLG+/QG2NlVNFsIMmi+vGs=; b=pW7OZ+deGA3OsgYZTi3rnZM0GQXTmVVDaEG39mMOKDNMRTQojW6Gj9lIddep7LSat3 zuKEvWz5JkVt1sP3U6tNjzxyQHmot9CB3WRbg5FcSEIa6exXw0j+KapvKwPu8EXn/Rwf d3l8E5QD226k5fi2d9E9/ezJjCK6gVmPZ0Z+W4QHlo/O1fDiuuBSaKcrCuQpmxzjQvW4 ERlW+y/rYZYvzjnd5B9dwNPKkQD8pW5LmoWKkg3AEVSxsq+SyadpN1Gx4X4ZpkSRsuwR BPSX6iKTQZTVdgMLFQa54kYI62ZPVzJGKaGOgCAd8aVYTCckNfWFEwfBZuVvK+5nwaFf iKqg== MIME-Version: 1.0 X-Received: by 10.50.23.105 with SMTP id l9mr11584357igf.13.1430724252774; Mon, 04 May 2015 00:24:12 -0700 (PDT) Received: by 10.36.79.144 with HTTP; Mon, 4 May 2015 00:24:12 -0700 (PDT) In-Reply-To: References: Date: Mon, 4 May 2015 17:24:12 +1000 Message-ID: Subject: Re: Accessing QueryContext inside a StoragePluginOptimizerRule From: Adam Gilmore To: dev@drill.apache.org Content-Type: multipart/alternative; boundary=089e01538e6cdbb6a605153c725c X-Virus-Checked: Checked by ClamAV on apache.org --089e01538e6cdbb6a605153c725c Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Just wanted to let you know I've uploaded the patch: https://issues.apache.org/jira/browse/DRILL-1950 Be great to get some feedback or start a review board so I can see what needs to be done to get it merged in! On Fri, May 1, 2015 at 12:13 PM, Jacques Nadeau wrote: > Yes. Once per query is fine. Theoretically we could fire the rule many > times inside a single query and I was saying we preferably cache in the > context of a single query. > > On Thu, Apr 30, 2015 at 5:57 PM, Adam Gilmore > wrote: > > > We'll need to look it up at least once per query, though, right? Becau= se > > session variables can change query to query. Would you suggest I cache= d > > the actual setting(s) as soon as the rule is instantiated? The rule is > > only called once or twice per query, I think. > > > > On Fri, May 1, 2015 at 10:54 AM, Jacques Nadeau > > wrote: > > > > > Creating it at rule creation time is good. The key is we want to cac= he > > it > > > so we don't have to look it up every single time the rule fires. I > think > > > your point about specific versus general and PlannerSettings makes > sense. > > > > > > On Thu, Apr 30, 2015 at 5:17 PM, Adam Gilmore > > > wrote: > > > > > > > Moreover, I'm a tad wary about using PlannerSettings as they seem > very > > > > generic (i.e. not specific to a particular storage/format plugin, f= or > > > > example). I've used "store.parquet.enable_pushdown_filter" as the > > > option, > > > > as it's very specific to the Parquet format plugin. I imagine I ca= n > > > still > > > > use getOptions() to get that option from PlannerSettings, but it > feels > > a > > > > bit more like we should be using QueryContext like PruneScanRule > uses. > > > > > > > > What do you think? > > > > > > > > On Fri, May 1, 2015 at 10:13 AM, Adam Gilmore > > > > > wrote: > > > > > > > > > I actually patched the StorageEnginePlugin and FormatPlugin to pa= ss > > > > > QueryContext right through the chain of getting optimizer rules (= as > > is > > > > the > > > > > case for getting basic rules, etc.). This seemed to me like the > most > > > > > logical approach and aligned with something like your > PruneScanRule. > > > > > > > > > > Do you think I should revert back and use the example you gave?= =E2=80=8B > > > > > > > > > > P.S. The pushdown filter has improved performance significantly f= or > > > > > certain types of queries. It mostly comes back to how well the > data > > is > > > > > ordered and how often it can exclude row groups. Ideally, we > should > > be > > > > > excluding individual pages as well, but from what I can see even > the > > > > reader > > > > > from the Parquet library does not yet do this. > > > > > > > > > > On Thu, Apr 30, 2015 at 4:14 PM, Jacques Nadeau < > jacques@apache.org> > > > > > wrote: > > > > > > > > > >> PlannerSettings is the primary we expose settings that need to b= e > > > > >> interrogated during planning (e.g. in an optimizer rule). You c= an > > get > > > > >> ahold of this by doing: > > > > >> > > > > >> PlannerSettings settings =3D > > > > PrelUtil.getPlannerSettings(call.getPlanner()); > > > > >> > > > > >> PlannerSettings then has access to session settings. > > > > >> > > > > >> You can see an example of this at [1] > > > > >> > > > > >> I'm excited to see the impact of this. Look forward to seeing t= he > > > > patch! > > > > >> > > > > >> [1] > > > > >> > > > > >> > > > > > > > > > > https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/= org/apache/drill/exec/planner/physical/NestedLoopJoinPrule.java#L82 > > > > >> > > > > >> > > > > >> On Wed, Apr 29, 2015 at 11:06 PM, Adam Gilmore < > > dragoncurve@gmail.com > > > > > > > > >> wrote: > > > > >> > > > > >> > Hi guys, > > > > >> > > > > > >> > I'm trying to work out how I could access the QueryContext > inside > > > > >> > a StoragePluginOptimizerRule. > > > > >> > > > > > >> > I've basically implemented the Parquet pushdown filtering, but= I > > > > really > > > > >> > need to access the session settings (for whether or not we're > > using > > > > the > > > > >> new > > > > >> > Parquet reader so we can completely pushdown a filter and also > for > > > > >> > providing a setting to enable/disable pushdown filters). > > > > >> > > > > > >> > I've looked in a fair bit of detail, but there doesn't seem to > be > > a > > > > way > > > > >> to > > > > >> > access this. > > > > >> > > > > > >> > If this is not the way I should be implementing it, can anyone > > make > > > > some > > > > >> > suggestions? I really want to do full pushdown when using the > new > > > > >> Parquet > > > > >> > reader, but I have no easy way to detect it's going to be used= . > > > > >> > > > > > >> > P.S. In cases where we "fall back" to the new Parquet reader, = I > > > don't > > > > >> do a > > > > >> > full pushdown (as detecting that is done further down than the > > > > planner). > > > > >> > This could be fixed in the future, but for now I'm happy for > just > > > when > > > > >> that > > > > >> > setting is true. > > > > >> > > > > > >> > > > > > > > > > > > > > > > > > > > > --089e01538e6cdbb6a605153c725c--