hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xuefu Zhang <xzh...@cloudera.com>
Subject Re: It seems that result of Hive on Spark is mistake And result of Hive and Hive on Spark are not the same
Date Tue, 22 Dec 2015 18:22:26 GMT
It seems that the plan isn't quite right, possibly due to union all
optimization in Spark. Could you create a JIRA for this?

CC Chengxiang as he might have some insight.

Thanks,
Xuefu

On Tue, Dec 22, 2015 at 3:39 AM, Jone Zhang <joyoungzhang@gmail.com> wrote:

> Hive 1.2.1 on Spark1.4.1
>
> 2015-12-22 19:31 GMT+08:00 Jone Zhang <joyoungzhang@gmail.com>:
>
>> *select  * from staff;*
>> 1 jone 22 1
>> 2 lucy 21 1
>> 3 hmm 22 2
>> 4 james 24 3
>> 5 xiaoliu 23 3
>>
>> *select id,date_ from trade union all select id,"test" from trade ;*
>> 1 201510210908
>> 2 201509080234
>> 2 201509080235
>> 1 test
>> 2 test
>> 2 test
>>
>> *set hive.execution.engine=spark;*
>> *set spark.master=local;*
>> *select /*+mapjoin(t)*/ * from staff s join *
>> *(select id,date_ from trade union all select id,"test" from trade ) t on
>> s.id <http://s.id>=t.id <http://t.id>;*
>> 1 jone 22 1 1 201510210908
>> 2 lucy 21 1 2 201509080234
>> 2 lucy 21 1 2 201509080235
>>
>> *set hive.execution.engine=mr;*
>> *select /*+mapjoin(t)*/ * from staff s join *
>> *(select id,date_ from trade union all select id,"test" from trade ) t on
>> s.id <http://s.id>=t.id <http://t.id>;*
>> FAILED: SemanticException [Error 10227]: Not all clauses are supported
>> with mapjoin hint. Please remove mapjoin hint.
>>
>> *I have two questions*
>> *1.Why result of hive on spark not include the following record?*
>> 1 jone 22 1 1 test
>> 2 lucy 21 1 2 test
>> 2 lucy 21 1 2 test
>>
>> *2.Why there are two different ways of dealing same query?*
>>
>>
>> *explain 1:*
>> *set hive.execution.engine=spark;*
>> *set spark.master=local;*
>> *explain *
>> *select id,date_ from trade union all select id,"test" from trade;*
>> OK
>> STAGE DEPENDENCIES:
>>   Stage-1 is a root stage
>>   Stage-0 depends on stages: Stage-1
>>
>> STAGE PLANS:
>>   Stage: Stage-1
>>     Spark
>>       DagName:
>> jonezhang_20151222191643_5301d90a-caf0-4934-8092-d165c87a4190:1
>>       Vertices:
>>         Map 1
>>             Map Operator Tree:
>>                 TableScan
>>                   alias: trade
>>                   Statistics: Num rows: 6 Data size: 48 Basic stats:
>> COMPLETE Column stats: NONE
>>                   Select Operator
>>                     expressions: id (type: int), date_ (type: string)
>>                     outputColumnNames: _col0, _col1
>>                     Statistics: Num rows: 6 Data size: 48 Basic stats:
>> COMPLETE Column stats: NONE
>>                     File Output Operator
>>                       compressed: false
>>                       Statistics: Num rows: 12 Data size: 96 Basic stats:
>> COMPLETE Column stats: NONE
>>                       table:
>>                           input format:
>> org.apache.hadoop.mapred.TextInputFormat
>>                           output format:
>> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>>                           serde:
>> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>>         Map 2
>>             Map Operator Tree:
>>                 TableScan
>>                   alias: trade
>>                   Statistics: Num rows: 6 Data size: 48 Basic stats:
>> COMPLETE Column stats: NONE
>>                   Select Operator
>>                     expressions: id (type: int), 'test' (type: string)
>>                     outputColumnNames: _col0, _col1
>>                     Statistics: Num rows: 6 Data size: 48 Basic stats:
>> COMPLETE Column stats: NONE
>>                     File Output Operator
>>                       compressed: false
>>                       Statistics: Num rows: 12 Data size: 96 Basic stats:
>> COMPLETE Column stats: NONE
>>                       table:
>>                           input format:
>> org.apache.hadoop.mapred.TextInputFormat
>>                           output format:
>> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>>                           serde:
>> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>>
>>   Stage: Stage-0
>>     Fetch Operator
>>       limit: -1
>>       Processor Tree:
>>         ListSink
>>
>>
>> *explain 2:*
>> *set hive.execution.engine=spark;*
>> *set spark.master=local;*
>> *explain *
>> *select /*+mapjoin(t)*/ * from staff s join *
>> *(select id,date_ from trade union all select id,"20999999999" from trade
>> ) t on s.id <http://s.id>=t.id <http://t.id>;*
>> OK
>> STAGE DEPENDENCIES:
>>   Stage-2 is a root stage
>>   Stage-1 depends on stages: Stage-2
>>   Stage-0 depends on stages: Stage-1
>>
>> STAGE PLANS:
>>   Stage: Stage-2
>>     Spark
>>       DagName:
>> jonezhang_20151222191716_be7eac84-b5b6-4478-b88f-9f59e2b1b1a8:3
>>       Vertices:
>>         Map 1
>>             Map Operator Tree:
>>                 TableScan
>>                   alias: trade
>>                   Statistics: Num rows: 6 Data size: 48 Basic stats:
>> COMPLETE Column stats: NONE
>>                   Filter Operator
>>                     predicate: id is not null (type: boolean)
>>                     Statistics: Num rows: 3 Data size: 24 Basic stats:
>> COMPLETE Column stats: NONE
>>                     Select Operator
>>                       expressions: id (type: int), date_ (type: string)
>>                       outputColumnNames: _col0, _col1
>>                       Statistics: Num rows: 3 Data size: 24 Basic stats:
>> COMPLETE Column stats: NONE
>>                       Spark HashTable Sink Operator
>>                         keys:
>>                           0 id (type: int)
>>                           1 _col0 (type: int)
>>             Local Work:
>>               Map Reduce Local Work
>>
>>   Stage: Stage-1
>>     Spark
>>       DagName:
>> jonezhang_20151222191716_be7eac84-b5b6-4478-b88f-9f59e2b1b1a8:2
>>       Vertices:
>>         Map 2
>>             Map Operator Tree:
>>                 TableScan
>>                   alias: s
>>                   Statistics: Num rows: 1 Data size: 66 Basic stats:
>> COMPLETE Column stats: NONE
>>                   Filter Operator
>>                     predicate: id is not null (type: boolean)
>>                     Statistics: Num rows: 1 Data size: 66 Basic stats:
>> COMPLETE Column stats: NONE
>>                     Map Join Operator
>>                       condition map:
>>                            Inner Join 0 to 1
>>                       keys:
>>                         0 id (type: int)
>>                         1 _col0 (type: int)
>>                       outputColumnNames: _col0, _col1, _col2, _col3,
>> _col7, _col8
>>                       input vertices:
>>                         1 Map 1
>>                       Statistics: Num rows: 6 Data size: 52 Basic stats:
>> COMPLETE Column stats: NONE
>>                       Select Operator
>>                         expressions: _col0 (type: int), _col1 (type:
>> string), _col2 (type: int), _col3 (type: int), _col7 (type: int), _col8
>> (type: string)
>>                         outputColumnNames: _col0, _col1, _col2, _col3,
>> _col4, _col5
>>                         Statistics: Num rows: 6 Data size: 52 Basic
>> stats: COMPLETE Column stats: NONE
>>                         File Output Operator
>>                           compressed: false
>>                           Statistics: Num rows: 6 Data size: 52 Basic
>> stats: COMPLETE Column stats: NONE
>>                           table:
>>                               input format:
>> org.apache.hadoop.mapred.TextInputFormat
>>                               output format:
>> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>>                               serde:
>> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>>             Local Work:
>>               Map Reduce Local Work
>>
>>   Stage: Stage-0
>>     Fetch Operator
>>       limit: -1
>>       Processor Tree:
>>         ListSink
>>
>>
>>
>> *I can't find any information about union "test" in explain 2.*
>>
>> *Some properties on hive-site.xml is *
>> <property>
>> <name>hive.ignore.mapjoin.hint</name>
>> <value>false</value>
>> </property>
>> <property>
>> <name>hive.auto.convert.join</name>
>> <value>true</value>
>> </property>
>> <property>
>> <name>hive.auto.convert.join.noconditionaltask</name>
>> <value>true</value>
>>
>>
>>
>>
>> *Thanks.*
>> *Best wishes.*
>>
>
>

Mime
View raw message