hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xuefu Zhang <xzh...@cloudera.com>
Subject Re: Two results are inconsistent when i use Hive on Spark
Date Wed, 27 Jan 2016 14:39:45 GMT
Hi Jone,

Did you meant you get different results from time to time? If so, could you
run "explain query" multiple times to see if there is any difference. Also,
could you try without map-join hint?

Without dataset, it's hard to reproduce the problem. Thus, it's great if
you can provide DML, and the dataset.

Thanks,
Xuefu

On Tue, Jan 26, 2016 at 11:25 PM, Jone Zhang <joyoungzhang@gmail.com> wrote:

> *Some properties on hive-site.xml is*
>
> <property>
> <name>hive.ignore.mapjoin.hint</name>
> <value>false</value>
> </property>
> <property>
> <name>hive.auto.convert.join</name>
> <value>true</value>
> </property>
> <property>
> <name>hive.auto.convert.join.noconditionaltask</name>
> <value>true</value>
>
>
> *If more information is required,please let us know.*
>
> *Thanks.*
>
> 2016-01-27 15:20 GMT+08:00 Jone Zhang <joyoungzhang@gmail.com>:
>
>> *I have run a query many times, there will be two results without
>> regular.*
>> *One is 36834699 and other is 18464706.*
>>
>> *The query is *
>> set spark.yarn.queue=soft.high;
>> set hive.execution.engine=spark;
>> select /*+mapjoin(t3,t4,t5)*/
>>   count(1)
>> from
>>   (
>>   select
>>     coalesce(t11.qua,t12.qua,t13.qua) qua,
>>     coalesce(t11.scene,t12.lanmu_id,t13.lanmu_id) scene,
>>     coalesce(t11.app_id,t12.appid,t13.app_id) app_id,
>>     expos_pv,
>>     expos_uv,
>>     dload_pv,
>>     dload_uv,
>>     dload_cnt,
>>     dload_user,
>>     evil_dload_cnt,
>>     evil_dload_user,
>>     update_dcnt,
>>     update_duser,
>>     hand_suc_incnt,
>>     hand_suc_inuser,
>>     day_hand_suc_incnt,
>>     day_hand_suc_inuser
>>   from
>>     (select * from t_ed_soft_assist_useraction_stat where ds=20160126)t11
>>     full outer join
>>     (select * from t_md_soft_lanmu_app_dload_detail where ds=20160126)t12
>>     on t11.qua=t12.qua and t11.app_id=t12.appid and t11.scene=t12.lanmu_id
>>     full outer join
>>     (select * from t_md_soft_client_install_lanmu  where ds=20160126)t13
>>     on t11.qua=t13.qua and t11.app_id=t13.app_id and
>> t11.scene=t13.lanmu_id
>>   )t1
>>   left outer join t_rd_qua t3 on t3.ds=20160126 and t1.qua=t3.qua
>>   left outer join t_rd_soft_appnew_last t4 on t4.ds=20160126 and
>> t1.app_id=t4.app_id
>>   left outer join t_rd_soft_page_conf t5 on t5.ds=20160126 and
>> t1.scene=t5.pageid and t3.client_type_id=t5.ismtt;
>>
>>
>> *Explain query is*
>> STAGE DEPENDENCIES:
>>   Stage-2 is a root stage
>>   Stage-1 depends on stages: Stage-2
>>   Stage-0 depends on stages: Stage-1
>>
>> STAGE PLANS:
>>   Stage: Stage-2
>>     Spark
>>       DagName: mqq_20160127151826_e8197f40-18d7-430c-9fc8-993facb74534:2
>>       Vertices:
>>         Map 6
>>             Map Operator Tree:
>>                 TableScan
>>                   alias: t3
>>                   Statistics: Num rows: 1051 Data size: 113569 Basic
>> stats: COMPLETE Column stats: NONE
>>                   Spark HashTable Sink Operator
>>                     keys:
>>                       0 _col0 (type: string)
>>                       1 qua (type: string)
>>             Local Work:
>>               Map Reduce Local Work
>>         Map 7
>>             Map Operator Tree:
>>                 TableScan
>>                   alias: t4
>>                   Statistics: Num rows: 2542751 Data size: 220433659
>> Basic stats: COMPLETE Column stats: NONE
>>                   Spark HashTable Sink Operator
>>                     keys:
>>                       0 UDFToDouble(_col2) (type: double)
>>                       1 UDFToDouble(app_id) (type: double)
>>             Local Work:
>>               Map Reduce Local Work
>>         Map 8
>>             Map Operator Tree:
>>                 TableScan
>>                   alias: t5
>>                   Statistics: Num rows: 143 Data size: 28605 Basic stats:
>> COMPLETE Column stats: NONE
>>                   Spark HashTable Sink Operator
>>                     keys:
>>                       0 _col1 (type: string), UDFToDouble(_col20) (type:
>> double)
>>                       1 pageid (type: string), UDFToDouble(ismtt) (type:
>> double)
>>             Local Work:
>>               Map Reduce Local Work
>>
>>   Stage: Stage-1
>>     Spark
>>       Edges:
>>         Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 5), Map 4
>> (PARTITION-LEVEL SORT, 5), Map 5 (PARTITION-LEVEL SORT, 5)
>>         Reducer 3 <- Reducer 2 (GROUP, 1)
>>       DagName: mqq_20160127151826_e8197f40-18d7-430c-9fc8-993facb74534:1
>>       Vertices:
>>         Map 1
>>             Map Operator Tree:
>>                 TableScan
>>                   alias: t_ed_soft_assist_useraction_stat
>>                   Statistics: Num rows: 16368107 Data size: 651461220
>> Basic stats: COMPLETE Column stats: NONE
>>                   Select Operator
>>                     expressions: qua (type: string), scene (type:
>> string), app_id (type: string)
>>                     outputColumnNames: _col0, _col1, _col2
>>                     Statistics: Num rows: 16368107 Data size: 651461220
>> Basic stats: COMPLETE Column stats: NONE
>>                     Reduce Output Operator
>>                       key expressions: _col0 (type: string),
>> UDFToDouble(_col2) (type: double), _col1 (type: string)
>>                       sort order: +++
>>                       Map-reduce partition columns: _col0 (type: string),
>> UDFToDouble(_col2) (type: double), _col1 (type: string)
>>                       Statistics: Num rows: 16368107 Data size: 651461220
>> Basic stats: COMPLETE Column stats: NONE
>>                       value expressions: _col2 (type: string)
>>         Map 4
>>             Map Operator Tree:
>>                 TableScan
>>                   alias: t_md_soft_lanmu_app_dload_detail
>>                   Statistics: Num rows: 2503976 Data size: 203324640
>> Basic stats: COMPLETE Column stats: NONE
>>                   Select Operator
>>                     expressions: qua (type: string), appid (type:
>> bigint), lanmu_id (type: string)
>>                     outputColumnNames: _col2, _col3, _col4
>>                     Statistics: Num rows: 2503976 Data size: 203324640
>> Basic stats: COMPLETE Column stats: NONE
>>                     Reduce Output Operator
>>                       key expressions: _col2 (type: string),
>> UDFToDouble(_col3) (type: double), _col4 (type: string)
>>                       sort order: +++
>>                       Map-reduce partition columns: _col2 (type: string),
>> UDFToDouble(_col3) (type: double), _col4 (type: string)
>>                       Statistics: Num rows: 2503976 Data size: 203324640
>> Basic stats: COMPLETE Column stats: NONE
>>                       value expressions: _col3 (type: bigint)
>>         Map 5
>>             Map Operator Tree:
>>                 TableScan
>>                   alias: t_md_soft_client_install_lanmu
>>                   Statistics: Num rows: 2143605 Data size: 178502520
>> Basic stats: COMPLETE Column stats: NONE
>>                   Select Operator
>>                     expressions: lanmu_id (type: string), qua (type:
>> string), app_id (type: int)
>>                     outputColumnNames: _col13, _col2, _col3
>>                     Statistics: Num rows: 2143605 Data size: 178502520
>> Basic stats: COMPLETE Column stats: NONE
>>                     Reduce Output Operator
>>                       key expressions: _col2 (type: string),
>> UDFToDouble(_col3) (type: double), _col13 (type: string)
>>                       sort order: +++
>>                       Map-reduce partition columns: _col2 (type: string),
>> UDFToDouble(_col3) (type: double), _col13 (type: string)
>>                       Statistics: Num rows: 2143605 Data size: 178502520
>> Basic stats: COMPLETE Column stats: NONE
>>                       value expressions: _col3 (type: int)
>>         Reducer 2
>>             Local Work:
>>               Map Reduce Local Work
>>             Reduce Operator Tree:
>>               Join Operator
>>                 condition map:
>>                      Outer Join 0 to 1
>>                      Outer Join 0 to 2
>>                 keys:
>>                   0 _col0 (type: string), UDFToDouble(_col2) (type:
>> double), _col1 (type: string)
>>                   1 _col2 (type: string), UDFToDouble(_col3) (type:
>> double), _col4 (type: string)
>>                   2 _col2 (type: string), UDFToDouble(_col3) (type:
>> double), _col13 (type: string)
>>                 outputColumnNames: _col0, _col1, _col2, _col10, _col11,
>> _col12, _col26, _col27, _col37
>>                 Statistics: Num rows: 36009836 Data size: 1433214715
>> Basic stats: COMPLETE Column stats: NONE
>>                 Select Operator
>>                   expressions: COALESCE(_col0,_col10,_col26) (type:
>> string), COALESCE(_col1,_col12,_col37) (type: string),
>> COALESCE(_col2,_col11,_col27) (type: string)
>>                   outputColumnNames: _col0, _col1, _col2
>>                   Statistics: Num rows: 36009836 Data size: 1433214715
>> Basic stats: COMPLETE Column stats: NONE
>>                   Map Join Operator
>>                     condition map:
>>                          Left Outer Join0 to 1
>>                     keys:
>>                       0 _col0 (type: string)
>>                       1 qua (type: string)
>>                     outputColumnNames: _col1, _col2, _col20
>>                     input vertices:
>>                       1 Map 6
>>                     Statistics: Num rows: 39610820 Data size: 1576536220
>> Basic stats: COMPLETE Column stats: NONE
>>                     Map Join Operator
>>                       condition map:
>>                            Left Outer Join0 to 1
>>                       keys:
>>                         0 UDFToDouble(_col2) (type: double)
>>                         1 UDFToDouble(app_id) (type: double)
>>                       outputColumnNames: _col1, _col20
>>                       input vertices:
>>                         1 Map 7
>>                       Statistics: Num rows: 43571902 Data size:
>> 1734189879 Basic stats: COMPLETE Column stats: NONE
>>                       Map Join Operator
>>                         condition map:
>>                              Left Outer Join0 to 1
>>                         keys:
>>                           0 _col1 (type: string), UDFToDouble(_col20)
>> (type: double)
>>                           1 pageid (type: string), UDFToDouble(ismtt)
>> (type: double)
>>                         input vertices:
>>                           1 Map 8
>>                         Statistics: Num rows: 47929093 Data size:
>> 1907608908 Basic stats: COMPLETE Column stats: NONE
>>                         Group By Operator
>>                           aggregations: count(1)
>>                           mode: hash
>>                           outputColumnNames: _col0
>>                           Statistics: Num rows: 1 Data size: 8 Basic
>> stats: COMPLETE Column stats: NONE
>>                           Reduce Output Operator
>>                             sort order:
>>                             Statistics: Num rows: 1 Data size: 8 Basic
>> stats: COMPLETE Column stats: NONE
>>                             value expressions: _col0 (type: bigint)
>>         Reducer 3
>>             Reduce Operator Tree:
>>               Group By Operator
>>                 aggregations: count(VALUE._col0)
>>                 mode: mergepartial
>>                 outputColumnNames: _col0
>>                 Statistics: Num rows: 1 Data size: 8 Basic stats:
>> COMPLETE Column stats: NONE
>>                 File Output Operator
>>                   compressed: false
>>                   Statistics: Num rows: 1 Data size: 8 Basic stats:
>> COMPLETE Column stats: NONE
>>                   table:
>>                       input format:
>> org.apache.hadoop.mapred.TextInputFormat
>>                       output format:
>> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>>                       serde:
>> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>>
>>   Stage: Stage-0
>>     Fetch Operator
>>       limit: -1
>>       Processor Tree:
>>         ListSink
>>
>> Time taken: 3.723 seconds, Fetched: 173 row(s)
>>
>>
>>
>> *Besh wishes.*
>> *Thank for any help.*
>>
>
>

Mime
View raw message