hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aihua Xu <...@cloudera.com>
Subject Fwd: Exception with the window function
Date Tue, 13 Jan 2015 17:36:53 GMT
Any one has insights on the following issue? Where would be the best place to make the changes?
I have tried several approaches but it breaks some other queries. 

Basically, the inner query produces some additional columns (from window function) not consumed
by the outer query. The query plan says we won’t generate those columns since they are not
used (which seems correct), while during the execution, we DO output the additional column
to the temp file. That causes the issue when we read the temp file in later.

Best regards,
Aihua Xu
Software Engineer, CCE
axu@cloudera.com

Begin forwarded message:

> From: Aihua Xu <axu@cloudera.com>
> Subject: Exception with the window function
> Date: January 5, 2015 at 6:51:01 AM PST
> To: user@hive.apache.org
> 
> Hi folks,
> 
> I’m working on the HIVE-9228 (https://issues.apache.org/jira/browse/HIVE-9228) with
the following window query throwing ArrayOutOfBoundExcption. 
> 
> select st_fips_cd, zip_cd_5, hh_surr_key
> from
> (
> select st_fips_cd, zip_cd_5, hh_surr_key,
> count( case when advtg_len_rsdnc_cd = '1' then 1 end ) over (partition by st_fips_cd,
zip_cd_5) as CNT_ADVTG_LEN_RSDNC_CD_1,
> row_number() over (partition by st_fips_cd, zip_cd_5 order by hh_surr_key asc) as analytic_row_number3
> from hh_agg
> where analytic_row_number2 = 1
> ) t;
> 
> 
> Here is the explain extend output for the query.  At the File Sink Operator of stage
1 below, seems like it should only output 4 columns while the temp table in fact output one
additional column (the value of CNT_ADVTG_LEN_RSDNC_CD_1). I’m investigating toward such
mismatch, but anyone can confirm and provide additional info that will be helpful.
> 
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-2 depends on stages: Stage-1
>   Stage-0 is a root stage
> 
> STAGE PLANS:
>   Stage: Stage-1
>     Map Reduce
>       Map Operator Tree:
>           TableScan
>             alias: hh_agg
>             Statistics: Num rows: 33208 Data size: 10361206 Basic stats: COMPLETE Column
stats: NONE
>             GatherStats: false
>             Filter Operator
>               isSamplingPred: false
>               predicate: (analytic_row_number2 = 1) (type: boolean)
>               Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE Column
stats: NONE
>               Reduce Output Operator
>                 key expressions: st_fips_cd (type: string), zip_cd_5 (type: string),
st_fips_cd (type: string), zip_cd_5 (type: string)
>                 sort order: ++++
>                 Map-reduce partition columns: st_fips_cd (type: string), zip_cd_5 (type:
string)
>                 Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE
Column stats: NONE
>                 tag: -1
>                 value expressions: st_fips_cd (type: string), zip_cd_5 (type: string),
hh_surr_key (type: bigint), advtg_len_rsdnc_cd (type: string)
>       Path -> Alias:
>         file:/Users/axu/Documents/localDB/23982_debug [t:hh_agg]
>       Path -> Partition:
>         file:/Users/axu/Documents/localDB/23982_debug 
>           Partition
>             base file name: 23982_debug
>             input format: org.apache.hadoop.mapred.TextInputFormat
>             output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>             properties:
>               COLUMN_STATS_ACCURATE true
>               EXTERNAL TRUE
>               bucket_count -1
>               columns st_fips_cd,zip_cd_5,hh_surr_key,nbr_hh_in_zip,nbr_nr_adults_in_hh,hh_pop,advtg_len_rsdnc_cd,advtg_home_ownr_cd,dsf_season_cd,advtg_hh_edu_cd,advtg_hh_occupn_cd,advtg_child_presnc_cd,advtg_hh_age_cd,zip_avg_age,zip_mdn_age,mail_rspns_buy_cd,cnt_gend_cd_1,cnt_gend_cd_2,cnt_gend_cd_3,cnt_gend_cd_unk,cnt_advtg_marital_stat_cd_1,cnt_advtg_marital_stat_cd_2,cnt_advtg_marital_stat_cd_unk,cnt_nbr_tradeline_0,cnt_nbr_tradeline_1,cnt_nbr_tradeline_2,cnt_nbr_tradeline_3,cnt_nbr_tradeline_4,cnt_nbr_tradeline_5,cnt_nbr_tradeline_6,cnt_nbr_tradeline_7,cnt_nbr_tradeline_8,cnt_nbr_tradeline_9,cnt_nbr_tradeline_unk,advtg_dwell_type_cd,prprty_mkt_val_cd,zip_avg_prprty_mkt_val,zip_mdn_prprty_mkt_val,zip_avg_home_eqty_amt,zip_mdn_home_eqty_amt,trgt_inc_cd,zip_avg_trgt_inc_narrow_band,zip_mdn_trgt_inc_narrow_band,zip_avg_inc_prodc_asset_cd,zip_mdn_inc_prodc_asset_cd,zip_avg_net_wrth_cd,zip_mdn_net_wrth_cd,rylty_trgt_mktg_val_scr_cd,analytic_row_number2
>               columns.comments 
>               columns.types string:string:bigint:bigint:bigint:tinyint:string:string:string:string:string:string:string:double:double:int:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:string:string:double:double:double:double:string:double:double:double:double:double:double:string:int
>               field.delim ,
>               file.inputformat org.apache.hadoop.mapred.TextInputFormat
>               file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>               location file:/Users/axu/Documents/localDB/23982_debug
>               name default.hh_agg
>               numFiles 0
>               numRows 0
>               rawDataSize 0
>               serialization.ddl struct hh_agg { string st_fips_cd, string zip_cd_5, i64
hh_surr_key, i64 nbr_hh_in_zip, i64 nbr_nr_adults_in_hh, byte hh_pop, string advtg_len_rsdnc_cd,
string advtg_home_ownr_cd, string dsf_season_cd, string advtg_hh_edu_cd, string advtg_hh_occupn_cd,
string advtg_child_presnc_cd, string advtg_hh_age_cd, double zip_avg_age, double zip_mdn_age,
i32 mail_rspns_buy_cd, i64 cnt_gend_cd_1, i64 cnt_gend_cd_2, i64 cnt_gend_cd_3, i64 cnt_gend_cd_unk,
i64 cnt_advtg_marital_stat_cd_1, i64 cnt_advtg_marital_stat_cd_2, i64 cnt_advtg_marital_stat_cd_unk,
i64 cnt_nbr_tradeline_0, i64 cnt_nbr_tradeline_1, i64 cnt_nbr_tradeline_2, i64 cnt_nbr_tradeline_3,
i64 cnt_nbr_tradeline_4, i64 cnt_nbr_tradeline_5, i64 cnt_nbr_tradeline_6, i64 cnt_nbr_tradeline_7,
i64 cnt_nbr_tradeline_8, i64 cnt_nbr_tradeline_9, i64 cnt_nbr_tradeline_unk, string advtg_dwell_type_cd,
string prprty_mkt_val_cd, double zip_avg_prprty_mkt_val, double zip_mdn_prprty_mkt_val, double
zip_avg_home_eqty_amt, double zip_mdn_home_eqty_amt, string trgt_inc_cd, double zip_avg_trgt_inc_narrow_band,
double zip_mdn_trgt_inc_narrow_band, double zip_avg_inc_prodc_asset_cd, double zip_mdn_inc_prodc_asset_cd,
double zip_avg_net_wrth_cd, double zip_mdn_net_wrth_cd, string rylty_trgt_mktg_val_scr_cd,
i32 analytic_row_number2}
>               serialization.format ,
>               serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>               totalSize 0
>               transient_lastDdlTime 1419367415
>             serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>           
>               input format: org.apache.hadoop.mapred.TextInputFormat
>               output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>               properties:
>                 COLUMN_STATS_ACCURATE true
>                 EXTERNAL TRUE
>                 bucket_count -1
>                 columns st_fips_cd,zip_cd_5,hh_surr_key,nbr_hh_in_zip,nbr_nr_adults_in_hh,hh_pop,advtg_len_rsdnc_cd,advtg_home_ownr_cd,dsf_season_cd,advtg_hh_edu_cd,advtg_hh_occupn_cd,advtg_child_presnc_cd,advtg_hh_age_cd,zip_avg_age,zip_mdn_age,mail_rspns_buy_cd,cnt_gend_cd_1,cnt_gend_cd_2,cnt_gend_cd_3,cnt_gend_cd_unk,cnt_advtg_marital_stat_cd_1,cnt_advtg_marital_stat_cd_2,cnt_advtg_marital_stat_cd_unk,cnt_nbr_tradeline_0,cnt_nbr_tradeline_1,cnt_nbr_tradeline_2,cnt_nbr_tradeline_3,cnt_nbr_tradeline_4,cnt_nbr_tradeline_5,cnt_nbr_tradeline_6,cnt_nbr_tradeline_7,cnt_nbr_tradeline_8,cnt_nbr_tradeline_9,cnt_nbr_tradeline_unk,advtg_dwell_type_cd,prprty_mkt_val_cd,zip_avg_prprty_mkt_val,zip_mdn_prprty_mkt_val,zip_avg_home_eqty_amt,zip_mdn_home_eqty_amt,trgt_inc_cd,zip_avg_trgt_inc_narrow_band,zip_mdn_trgt_inc_narrow_band,zip_avg_inc_prodc_asset_cd,zip_mdn_inc_prodc_asset_cd,zip_avg_net_wrth_cd,zip_mdn_net_wrth_cd,rylty_trgt_mktg_val_scr_cd,analytic_row_number2
>                 columns.comments 
>                 columns.types string:string:bigint:bigint:bigint:tinyint:string:string:string:string:string:string:string:double:double:int:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:string:string:double:double:double:double:string:double:double:double:double:double:double:string:int
>                 field.delim ,
>                 file.inputformat org.apache.hadoop.mapred.TextInputFormat
>                 file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                 location file:/Users/axu/Documents/localDB/23982_debug
>                 name default.hh_agg
>                 numFiles 0
>                 numRows 0
>                 rawDataSize 0
>                 serialization.ddl struct hh_agg { string st_fips_cd, string zip_cd_5,
i64 hh_surr_key, i64 nbr_hh_in_zip, i64 nbr_nr_adults_in_hh, byte hh_pop, string advtg_len_rsdnc_cd,
string advtg_home_ownr_cd, string dsf_season_cd, string advtg_hh_edu_cd, string advtg_hh_occupn_cd,
string advtg_child_presnc_cd, string advtg_hh_age_cd, double zip_avg_age, double zip_mdn_age,
i32 mail_rspns_buy_cd, i64 cnt_gend_cd_1, i64 cnt_gend_cd_2, i64 cnt_gend_cd_3, i64 cnt_gend_cd_unk,
i64 cnt_advtg_marital_stat_cd_1, i64 cnt_advtg_marital_stat_cd_2, i64 cnt_advtg_marital_stat_cd_unk,
i64 cnt_nbr_tradeline_0, i64 cnt_nbr_tradeline_1, i64 cnt_nbr_tradeline_2, i64 cnt_nbr_tradeline_3,
i64 cnt_nbr_tradeline_4, i64 cnt_nbr_tradeline_5, i64 cnt_nbr_tradeline_6, i64 cnt_nbr_tradeline_7,
i64 cnt_nbr_tradeline_8, i64 cnt_nbr_tradeline_9, i64 cnt_nbr_tradeline_unk, string advtg_dwell_type_cd,
string prprty_mkt_val_cd, double zip_avg_prprty_mkt_val, double zip_mdn_prprty_mkt_val, double
zip_avg_home_eqty_amt, double zip_mdn_home_eqty_amt, string trgt_inc_cd, double zip_avg_trgt_inc_narrow_band,
double zip_mdn_trgt_inc_narrow_band, double zip_avg_inc_prodc_asset_cd, double zip_mdn_inc_prodc_asset_cd,
double zip_avg_net_wrth_cd, double zip_mdn_net_wrth_cd, string rylty_trgt_mktg_val_scr_cd,
i32 analytic_row_number2}
>                 serialization.format ,
>                 serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>                 totalSize 0
>                 transient_lastDdlTime 1419367415
>               serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>               name: default.hh_agg
>             name: default.hh_agg
>       Truncated Path -> Alias:
>         file:/Users/axu/Documents/localDB/23982_debug [t:hh_agg]
>       Needs Tagging: false
>       Reduce Operator Tree:
>         Extract
>           Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE Column
stats: NONE
>           PTF Operator
>             Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE Column
stats: NONE
>             File Output Operator
>               compressed: false
>               GlobalTableId: 0
>               directory: file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003
>               NumFilesPerFileSink: 1
>               table:
>                   input format: org.apache.hadoop.mapred.SequenceFileInputFormat
>                   output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                   properties:
>                     columns _col0,_col1,_col2,_col6
>                     columns.types string,string,bigint,string
>                     escape.delim \
>                     serialization.lib org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
>                   serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
>               TotalFiles: 1
>               GatherStats: false
>               MultiFileSpray: false
> 
>   Stage: Stage-2
>     Map Reduce
>       Map Operator Tree:
>           TableScan
>             GatherStats: false
>             Reduce Output Operator
>               key expressions: _col0 (type: string), _col1 (type: string), _col2 (type:
bigint)
>               sort order: +++
>               Map-reduce partition columns: _col0 (type: string), _col1 (type: string)
>               Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE Column
stats: NONE
>               tag: -1
>               value expressions: _col0 (type: string), _col1 (type: string), _col2 (type:
bigint)
>       Path -> Alias:
>         file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003
[file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003]
>       Path -> Partition:
>         file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003

>           Partition
>             base file name: -mr-10003
>             input format: org.apache.hadoop.mapred.SequenceFileInputFormat
>             output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>             properties:
>               columns _col0,_col1,_col2,_col6
>               columns.types string,string,bigint,string
>               escape.delim \
>               serialization.lib org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
>             serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
>           
>               input format: org.apache.hadoop.mapred.SequenceFileInputFormat
>               output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>               properties:
>                 columns _col0,_col1,_col2,_col6
>                 columns.types string,string,bigint,string
>                 escape.delim \
>                 serialization.lib org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
>               serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
>       Truncated Path -> Alias:
>         file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003
[file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003]
>       Needs Tagging: false
>       Reduce Operator Tree:
>         Extract
>           Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE Column
stats: NONE
>           PTF Operator
>             Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE Column
stats: NONE
>             Select Operator
>               expressions: _col1 (type: string), _col2 (type: string), _col3 (type: bigint)
>               outputColumnNames: _col0, _col1, _col2
>               Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE Column
stats: NONE
>               File Output Operator
>                 compressed: false
>                 GlobalTableId: 0
>                 directory: file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10000/.hive-staging_hive_2015-01-01_13-25-01_841_2717056365737730087-1/-ext-10001
>                 NumFilesPerFileSink: 1
>                 Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE
Column stats: NONE
>                 Stats Publishing Key Prefix: file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10000/.hive-staging_hive_2015-01-01_13-25-01_841_2717056365737730087-1/-ext-10001/
>                 table:
>                     input format: org.apache.hadoop.mapred.TextInputFormat
>                     output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                     properties:
>                       columns _col0,_col1,_col2
>                       columns.types string:string:bigint
>                       escape.delim \
>                       hive.serialization.extend.nesting.levels true
>                       serialization.format 1
>                       serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>                     serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>                 TotalFiles: 1
>                 GatherStats: false
>                 MultiFileSpray: false
> 
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1


Mime
View raw message