From dev-return-145930-archive-asf-public=cust-asf.ponee.io@hive.apache.org Mon Jan 29 19:26:30 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id 0FC19180654 for ; Mon, 29 Jan 2018 19:26:30 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id EAB1F160C31; Mon, 29 Jan 2018 18:26:29 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E41B4160C2F for ; Mon, 29 Jan 2018 19:26:28 +0100 (CET) Received: (qmail 40560 invoked by uid 500); 29 Jan 2018 18:26:27 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 40549 invoked by uid 99); 29 Jan 2018 18:26:27 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Jan 2018 18:26:27 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 99C82180426; Mon, 29 Jan 2018 18:26:26 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.061 X-Spam-Level: X-Spam-Status: No, score=-0.061 tagged_above=-999 required=6.31 tests=[HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=2, RCVD_IN_DNSWL_MED=-2.3, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id eALJhxlqfp_E; Mon, 29 Jan 2018 18:26:20 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 55E615F341; Mon, 29 Jan 2018 18:26:20 +0000 (UTC) Received: from reviews.apache.org (unknown [10.41.0.12]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id E9831E0115; Mon, 29 Jan 2018 18:26:19 +0000 (UTC) Received: from reviews-vm2.apache.org (localhost [IPv6:::1]) by reviews.apache.org (ASF Mail Server at reviews-vm2.apache.org) with ESMTP id C039DC40098; Mon, 29 Jan 2018 18:26:19 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============4092297446377585216==" MIME-Version: 1.0 Subject: Re: Review Request 65304: HIVE-18513 Query results caching From: =?utf-8?q?Jes=C3=BAs_Camacho_Rodr=C3=ADguez?= To: Ashutosh Chauhan , =?utf-8?q?Jes=C3=BAs_Camacho_Rodr=C3=ADguez?= , Gopal V Cc: hive , Jason Dere Date: Mon, 29 Jan 2018 18:26:19 -0000 Message-ID: <20180129182619.29696.8722@reviews-vm2.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: =?utf-8?q?Jes=C3=BAs_Camacho_Rodr=C3=ADguez_=3Cnoreply=40reviews=2E?=@reviews.apache.org, =?utf-8?q?apache=2Eorg=3E?=@reviews.apache.org X-ReviewGroup: hive X-Auto-Response-Suppress: DR, RN, OOF, AutoReply X-ReviewRequest-URL: https://reviews.apache.org/r/65304/ X-Sender: =?utf-8?q?Jes=C3=BAs_Camacho_Rodr=C3=ADguez_=3Cnoreply=40reviews?= =?utf-8?b?LmFwYWNoZS5vcmc+?= References: <20180125073448.814.70999@reviews-vm2.apache.org> In-Reply-To: <20180125073448.814.70999@reviews-vm2.apache.org> X-ReviewBoard-Diff-For: ql/src/test/queries/clientpositive/results_cache_temptable.q X-ReviewBoard-Diff-For: ql/src/test/results/clientpositive/results_cache_capacity.q.out X-ReviewBoard-Diff-For: ql/src/test/results/clientpositive/results_cache_temptable.q.out X-ReviewBoard-Diff-For: ql/src/java/org/apache/hadoop/hive/ql/cache/results/CacheUsage.java X-ReviewBoard-Diff-For: ql/src/test/queries/clientpositive/results_cache_2.q X-ReviewBoard-Diff-For: ql/src/test/results/clientpositive/llap/results_cache_1.q.out X-ReviewBoard-Diff-For: ql/src/test/results/clientpositive/results_cache_lifetime.q.out X-ReviewBoard-Diff-For: ql/src/test/queries/clientpositive/results_cache_1.q X-ReviewBoard-Diff-For: ql/src/test/results/clientpositive/results_cache_2.q.out X-ReviewBoard-Diff-For: ql/src/test/queries/clientpositive/results_cache_capacity.q X-ReviewBoard-Diff-For: ql/src/test/queries/clientpositive/results_cache_with_masking.q X-ReviewBoard-Diff-For: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOpMaterializationValidator.java X-ReviewBoard-Diff-For: ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java X-ReviewBoard-Diff-For: ql/src/test/results/clientpositive/results_cache_with_masking.q.out X-ReviewBoard-Diff-For: ql/src/test/queries/clientpositive/results_cache_lifetime.q X-ReviewBoard-Diff-For: ql/src/test/results/clientpositive/results_cache_1.q.out Reply-To: =?utf-8?q?Jes=C3=BAs_Camacho_Rodr=C3=ADguez_=3Cjcamachorodriguez?=@reviews.apache.org, =?utf-8?q?=40hortonworks=2Ecom=3E?=@reviews.apache.org X-ReviewRequest-Repository: hive-git --===============4092297446377585216== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/65304/#review196440 ----------------------------------------------------------- common/src/java/org/apache/hadoop/hive/conf/HiveConf.java Lines 3699 (patched) Is this the permission for each results directory? Does this mean that results cannot be shared by different users? Why does this need to be configurable? (I would assume this is not something that you let the user decide). common/src/java/org/apache/hadoop/hive/conf/HiveConf.java Lines 3710 (patched) What is _entry_ referring to? Does it mean the SQL query string size? We should extend the description to make it clear. ql/src/java/org/apache/hadoop/hive/ql/Driver.java Lines 1829 (patched) Currently what happens when we get this exception? It seems to me that mechanism in HIVE-17626 works when execution itself fails, but in this case, whether the cache entry is still valid or not can be inferred statically at planning time, hence I am not sure whether it should be handled the same way? It seems we will have certain overhead that might not be necessary. Can't we check the validity of the entry when we are replacing the plan by the scan on the cached results, e.g., in SemanticAnalyzer? ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java Lines 174 (patched) What happens if execution fails? Will the results still be cleaned properly? ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java Lines 262 (patched) Code might have been simpler using Guava cache. This is just a note, maybe something that can be considered for a follow-up. ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOpMaterializationValidator.java Lines 72 (patched) Leaf can be other node too, e.g., _DruidQuery_. Now reusing results is time-based, but for those type of tables, we do not have guarantees (as with other external tables). Thus, we should probably return true iff scan is an instance of HiveTableScan, false otherwise. ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOpMaterializationValidator.java Lines 77 (patched) _getProjects()_ ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOpMaterializationValidator.java Lines 87 (patched) It should be _getCondition()_. ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOpMaterializationValidator.java Lines 96 (patched) I think all the information should be in _getCondition()_. ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOpMaterializationValidator.java Lines 124 (patched) - There are some operators missing, e.g., semijoin. - I would create specific visit methods for all operators, then a fall-back visit method on RelNode that returns false if it does not recognize the operator that we are visiting (as a way to prevent any incorrect assesment, e.g., if new operators are added in the future). - Jesús Camacho Rodríguez On Jan. 25, 2018, 7:34 a.m., Jason Dere wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/65304/ > ----------------------------------------------------------- > > (Updated Jan. 25, 2018, 7:34 a.m.) > > > Review request for hive, Ashutosh Chauhan, Gopal V, and Jesús Camacho Rodríguez. > > > Bugs: HIVE-18513 > https://issues.apache.org/jira/browse/HIVE-18513 > > > Repository: hive-git > > > Description > ------- > > - For queries that result in MR/Tez/Spark jobs on the cluster, save the temporary query results to a cache directory where they can be re-used. > - Add QueryResultsCache to manage cached results. Currently cache invalidation is time-based, update-based cache invalidation needs to be added later. > - Driver/SemanticAnalyzer/Calcite planner changes to lookup queries in the cache and use in place of the query plan. > > > Diffs > ----- > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 0c2cf05 > common/src/java/org/apache/hadoop/hive/ql/log/PerfLogger.java 2767bca > data/conf/hive-site.xml 01f83d1 > data/conf/llap/hive-site.xml cdda875 > data/conf/perf-reg/spark/hive-site.xml 497a61f > data/conf/perf-reg/tez/hive-site.xml 012369f > data/conf/rlist/hive-site.xml 9de00e5 > data/conf/spark/local/hive-site.xml fd0e6a0 > data/conf/spark/standalone/hive-site.xml 1e5bd65 > data/conf/spark/yarn-client/hive-site.xml a9a788b > data/conf/tez/hive-site.xml 4519678 > itests/hive-blobstore/src/test/resources/hive-site.xml 038db0d > itests/src/test/resources/testconfiguration.properties 1017249 > itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 3f377f9 > ql/src/java/org/apache/hadoop/hive/ql/Driver.java 74595b0 > ql/src/java/org/apache/hadoop/hive/ql/cache/results/CacheUsage.java PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 2e1fd37 > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveCalciteUtil.java f0dd167 > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOpMaterializationValidator.java PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 372cfad > ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 85a1f34 > ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java ae2ec3d > ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 28e1041 > ql/src/java/org/apache/hadoop/hive/ql/plan/FetchWork.java 7243dc7 > ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 8af19d8 > ql/src/test/queries/clientpositive/results_cache_1.q PRE-CREATION > ql/src/test/queries/clientpositive/results_cache_2.q PRE-CREATION > ql/src/test/queries/clientpositive/results_cache_capacity.q PRE-CREATION > ql/src/test/queries/clientpositive/results_cache_lifetime.q PRE-CREATION > ql/src/test/queries/clientpositive/results_cache_temptable.q PRE-CREATION > ql/src/test/queries/clientpositive/results_cache_with_masking.q PRE-CREATION > ql/src/test/results/clientpositive/llap/results_cache_1.q.out PRE-CREATION > ql/src/test/results/clientpositive/results_cache_1.q.out PRE-CREATION > ql/src/test/results/clientpositive/results_cache_2.q.out PRE-CREATION > ql/src/test/results/clientpositive/results_cache_capacity.q.out PRE-CREATION > ql/src/test/results/clientpositive/results_cache_lifetime.q.out PRE-CREATION > ql/src/test/results/clientpositive/results_cache_temptable.q.out PRE-CREATION > ql/src/test/results/clientpositive/results_cache_with_masking.q.out PRE-CREATION > service/src/java/org/apache/hive/service/server/HiveServer2.java 2a528cd > > > Diff: https://reviews.apache.org/r/65304/diff/3/ > > > Testing > ------- > > qfile tests added. > > > Thanks, > > Jason Dere > > --===============4092297446377585216==--