Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D94B711C45 for ; Fri, 28 Mar 2014 22:45:25 +0000 (UTC) Received: (qmail 46238 invoked by uid 500); 28 Mar 2014 22:43:17 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 45518 invoked by uid 500); 28 Mar 2014 22:42:56 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 44815 invoked by uid 500); 28 Mar 2014 22:42:39 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 44487 invoked by uid 99); 28 Mar 2014 22:42:30 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Mar 2014 22:42:30 +0000 Date: Fri, 28 Mar 2014 22:42:30 +0000 (UTC) From: "Hari Sankar Sivarama Subramaniyan (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-6642) Query fails to vectorize when a non string partition column is part of the query expression MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-6642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-6642: ---------------------------------------------------- Attachment: (was: input_part7.q.out) > Query fails to vectorize when a non string partition column is part of the query expression > ------------------------------------------------------------------------------------------- > > Key: HIVE-6642 > URL: https://issues.apache.org/jira/browse/HIVE-6642 > Project: Hive > Issue Type: Bug > Reporter: Hari Sankar Sivarama Subramaniyan > Assignee: Hari Sankar Sivarama Subramaniyan > Fix For: 0.13.0 > > Attachments: HIVE-6642-2.patch, HIVE-6642-3.patch, HIVE-6642-4.patch, HIVE-6642.1.patch, load_dyn_part8.q.out, louter_join_ppr.q.out, merge3.q.out, metadataonly1.q.out, outer_join_ppr.q.out, pcr.q.out, ppd_vc.q.out, ppr_allchildsarenull.q.out, push_or.q.out, rand_partitionpruner2.q.out, rand_partitionpruner3.q.out, router_join_ppr.q.out, sample1.q.out, sample10.q.out, sample8.q.out, smb_mapjoin_11.q.out, sort_merge_join_desc_5.q.out, stats12.q.out, stats13.q.out, transform_ppr1.q.out, transform_ppr2.q.out, union_ppr.q.out > > > drop table if exists alltypesorc_part; > CREATE TABLE alltypesorc_part ( > ctinyint tinyint, > csmallint smallint, > cint int, > cbigint bigint, > cfloat float, > cdouble double, > cstring1 string, > cstring2 string, > ctimestamp1 timestamp, > ctimestamp2 timestamp, > cboolean1 boolean, > cboolean2 boolean) partitioned by (ds int) STORED AS ORC; > insert overwrite table alltypesorc_part partition (ds=2011) select * from alltypesorc limit 100; > insert overwrite table alltypesorc_part partition (ds=2012) select * from alltypesorc limit 200; > explain select * > from (select ds from alltypesorc_part) t1, > alltypesorc t2 > where t1.ds = t2.cint > order by t2.ctimestamp1 > limit 100; > The above query fails to vectorize because (select ds from alltypesorc_part) t1 returns a string column and the join equality on t2 is performed on an int column. The correct output when vectorization is turned on should be: > STAGE DEPENDENCIES: > Stage-5 is a root stage > Stage-2 depends on stages: Stage-5 > Stage-0 is a root stage > STAGE PLANS: > Stage: Stage-5 > Map Reduce Local Work > Alias -> Map Local Tables: > t1:alltypesorc_part > Fetch Operator > limit: -1 > Alias -> Map Local Operator Tree: > t1:alltypesorc_part > TableScan > alias: alltypesorc_part > Statistics: Num rows: 300 Data size: 62328 Basic stats: COMPLETE Column stats: COMPLETE > Select Operator > expressions: ds (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 300 Data size: 1200 Basic stats: COMPLETE Column stats: COMPLETE > HashTable Sink Operator > condition expressions: > 0 {_col0} > 1 {ctinyint} {csmallint} {cint} {cbigint} {cfloat} {cdouble} {cstring1} {cstring2} {ctimestamp1} {ctimestamp2} {cboolean1} {cboolean2} > keys: > 0 _col0 (type: int) > 1 cint (type: int) > Stage: Stage-2 > Map Reduce > Map Operator Tree: > TableScan > alias: t2 > Statistics: Num rows: 3536 Data size: 1131711 Basic stats: COMPLETE Column stats: NONE > Map Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 {_col0} > 1 {ctinyint} {csmallint} {cint} {cbigint} {cfloat} {cdouble} {cstring1} {cstring2} {ctimestamp1} {ctimestamp2} {cboolean1} {cboolean2} > keys: > 0 _col0 (type: int) > 1 cint (type: int) > outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12 > Statistics: Num rows: 3889 Data size: 1244882 Basic stats: COMPLETE Column stats: NONE > Filter Operator > predicate: (_col0 = _col3) (type: boolean) > Statistics: Num rows: 1944 Data size: 622280 Basic stats: COMPLETE Column stats: NONE > Select Operator > expressions: _col0 (type: int), _col1 (type: tinyint), _col2 (type: smallint), _col3 (type: int), _col4 (type: bigint), _col5 (type: float), _col6 (type: double), _col7 (type: string), _col8 (type: string), _col\ > 9 (type: timestamp), _col10 (type: timestamp), _col11 (type: boolean), _col12 (type: boolean) > outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12 > Statistics: Num rows: 1944 Data size: 622280 Basic stats: COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col9 (type: timestamp) > sort order: + > Statistics: Num rows: 1944 Data size: 622280 Basic stats: COMPLETE Column stats: NONE > value expressions: _col0 (type: int), _col1 (type: tinyint), _col2 (type: smallint), _col3 (type: int), _col4 (type: bigint), _col5 (type: float), _col6 (type: double), _col7 (type: string), _col8 (type: strin\ > g), _col9 (type: timestamp), _col10 (type: timestamp), _col11 (type: boolean), _col12 (type: boolean) > Local Work: > Map Reduce Local Work > Execution mode: vectorized > Reduce Operator Tree: > Extract > Statistics: Num rows: 1944 Data size: 622280 Basic stats: COMPLETE Column stats: NONE > Limit > Number of rows: 100 > Statistics: Num rows: 100 Data size: 32000 Basic stats: COMPLETE Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 100 Data size: 32000 Basic stats: COMPLETE Column stats: NONE > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Stage: Stage-0 > Fetch Operator > limit: 100 > where as with the current code, vectorization fails to take place because of the following exception > 14/03/12 14:43:19 DEBUG vector.VectorizationContext: No vector udf found for GenericUDFOPEqual, descriptor: Argument Count = 2, mode = FILTER, Argument Types = {STRING,LONG}, Input Expression Types = {COLUMN,COLUMN} > 14/03/12 14:43:19 DEBUG physical.Vectorizer: Failed to vectorize > org.apache.hadoop.hive.ql.metadata.HiveException: Udf: GenericUDFOPEqual, is not supported > at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:854) > at org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:300) > at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateExprNodeDesc(Vectorizer.java:682) > at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateFilterOperator(Vectorizer.java:606) > at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateOperator(Vectorizer.java:537) > at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$ValidationNodeProcessor.process(Vectorizer.java:367) > at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) > at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) > at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) > at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132) > at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) > at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateMapWork(Vectorizer.java:314) > at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(Vectorizer.java:283) > at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Vectorizer.java:270) > at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111) > at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:194) > at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:139) > at org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(Vectorizer.java:519) > at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:100) > at org.apache.hadoop.hive.ql.parse.MapReduceCompiler.optimizeTaskPlan(MapReduceCompiler.java:290) > at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:216) > at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9286) > at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) > at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:64) > at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:398) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:294) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:948) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:996) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:884) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:874) > at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:359) > at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:457) > at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:467) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:125) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424) > at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:687) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:160) -- This message was sent by Atlassian JIRA (v6.2#6252)