hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aihua Xu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-9228) Problem with subquery using windowing functions
Date Mon, 05 Jan 2015 14:49:35 GMT

    [ https://issues.apache.org/jira/browse/HIVE-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264639#comment-14264639
] 

Aihua Xu commented on HIVE-9228:
--------------------------------

[~ashutoshc] Are you working on this area, any idea?

The following window query throws ArrayOutOfBoundExcption. 

select st_fips_cd, zip_cd_5, hh_surr_key
from
(
select st_fips_cd, zip_cd_5, hh_surr_key,
count( case when advtg_len_rsdnc_cd = '1' then 1 end ) over (partition by st_fips_cd, zip_cd_5)
as CNT_ADVTG_LEN_RSDNC_CD_1,
row_number() over (partition by st_fips_cd, zip_cd_5 order by hh_surr_key asc) as analytic_row_number3
from hh_agg
where analytic_row_number2 = 1
) t;


Here is the explain extend output for the query.  At the File Sink Operator of stage 1 below,
seems like it should only output 4 columns while the temp table in fact output one additional
column (the value of CNT_ADVTG_LEN_RSDNC_CD_1). I’m investigating toward such mismatch,
but anyone can confirm and provide additional info that will be helpful.

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-2 depends on stages: Stage-1
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: hh_agg
            Statistics: Num rows: 33208 Data size: 10361206 Basic stats: COMPLETE Column stats:
NONE
            GatherStats: false
            Filter Operator
              isSamplingPred: false
              predicate: (analytic_row_number2 = 1) (type: boolean)
              Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE Column
stats: NONE
              Reduce Output Operator
                key expressions: st_fips_cd (type: string), zip_cd_5 (type: string), st_fips_cd
(type: string), zip_cd_5 (type: string)
                sort order: ++++
                Map-reduce partition columns: st_fips_cd (type: string), zip_cd_5 (type: string)
                Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE Column
stats: NONE
                tag: -1
                value expressions: st_fips_cd (type: string), zip_cd_5 (type: string), hh_surr_key
(type: bigint), advtg_len_rsdnc_cd (type: string)
      Path -> Alias:
        file:/Users/axu/Documents/localDB/23982_debug [t:hh_agg]
      Path -> Partition:
        file:/Users/axu/Documents/localDB/23982_debug 
          Partition
            base file name: 23982_debug
            input format: org.apache.hadoop.mapred.TextInputFormat
            output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
            properties:
              COLUMN_STATS_ACCURATE true
              EXTERNAL TRUE
              bucket_count -1
              columns st_fips_cd,zip_cd_5,hh_surr_key,nbr_hh_in_zip,nbr_nr_adults_in_hh,hh_pop,advtg_len_rsdnc_cd,advtg_home_ownr_cd,dsf_season_cd,advtg_hh_edu_cd,advtg_hh_occupn_cd,advtg_child_presnc_cd,advtg_hh_age_cd,zip_avg_age,zip_mdn_age,mail_rspns_buy_cd,cnt_gend_cd_1,cnt_gend_cd_2,cnt_gend_cd_3,cnt_gend_cd_unk,cnt_advtg_marital_stat_cd_1,cnt_advtg_marital_stat_cd_2,cnt_advtg_marital_stat_cd_unk,cnt_nbr_tradeline_0,cnt_nbr_tradeline_1,cnt_nbr_tradeline_2,cnt_nbr_tradeline_3,cnt_nbr_tradeline_4,cnt_nbr_tradeline_5,cnt_nbr_tradeline_6,cnt_nbr_tradeline_7,cnt_nbr_tradeline_8,cnt_nbr_tradeline_9,cnt_nbr_tradeline_unk,advtg_dwell_type_cd,prprty_mkt_val_cd,zip_avg_prprty_mkt_val,zip_mdn_prprty_mkt_val,zip_avg_home_eqty_amt,zip_mdn_home_eqty_amt,trgt_inc_cd,zip_avg_trgt_inc_narrow_band,zip_mdn_trgt_inc_narrow_band,zip_avg_inc_prodc_asset_cd,zip_mdn_inc_prodc_asset_cd,zip_avg_net_wrth_cd,zip_mdn_net_wrth_cd,rylty_trgt_mktg_val_scr_cd,analytic_row_number2
              columns.comments 
              columns.types string:string:bigint:bigint:bigint:tinyint:string:string:string:string:string:string:string:double:double:int:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:string:string:double:double:double:double:string:double:double:double:double:double:double:string:int
              field.delim ,
              file.inputformat org.apache.hadoop.mapred.TextInputFormat
              file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              location file:/Users/axu/Documents/localDB/23982_debug
              name default.hh_agg
              numFiles 0
              numRows 0
              rawDataSize 0
              serialization.ddl struct hh_agg { string st_fips_cd, string zip_cd_5, i64 hh_surr_key,
i64 nbr_hh_in_zip, i64 nbr_nr_adults_in_hh, byte hh_pop, string advtg_len_rsdnc_cd, string
advtg_home_ownr_cd, string dsf_season_cd, string advtg_hh_edu_cd, string advtg_hh_occupn_cd,
string advtg_child_presnc_cd, string advtg_hh_age_cd, double zip_avg_age, double zip_mdn_age,
i32 mail_rspns_buy_cd, i64 cnt_gend_cd_1, i64 cnt_gend_cd_2, i64 cnt_gend_cd_3, i64 cnt_gend_cd_unk,
i64 cnt_advtg_marital_stat_cd_1, i64 cnt_advtg_marital_stat_cd_2, i64 cnt_advtg_marital_stat_cd_unk,
i64 cnt_nbr_tradeline_0, i64 cnt_nbr_tradeline_1, i64 cnt_nbr_tradeline_2, i64 cnt_nbr_tradeline_3,
i64 cnt_nbr_tradeline_4, i64 cnt_nbr_tradeline_5, i64 cnt_nbr_tradeline_6, i64 cnt_nbr_tradeline_7,
i64 cnt_nbr_tradeline_8, i64 cnt_nbr_tradeline_9, i64 cnt_nbr_tradeline_unk, string advtg_dwell_type_cd,
string prprty_mkt_val_cd, double zip_avg_prprty_mkt_val, double zip_mdn_prprty_mkt_val, double
zip_avg_home_eqty_amt, double zip_mdn_home_eqty_amt, string trgt_inc_cd, double zip_avg_trgt_inc_narrow_band,
double zip_mdn_trgt_inc_narrow_band, double zip_avg_inc_prodc_asset_cd, double zip_mdn_inc_prodc_asset_cd,
double zip_avg_net_wrth_cd, double zip_mdn_net_wrth_cd, string rylty_trgt_mktg_val_scr_cd,
i32 analytic_row_number2}
              serialization.format ,
              serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              totalSize 0
              transient_lastDdlTime 1419367415
            serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
          
              input format: org.apache.hadoop.mapred.TextInputFormat
              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              properties:
                COLUMN_STATS_ACCURATE true
                EXTERNAL TRUE
                bucket_count -1
                columns st_fips_cd,zip_cd_5,hh_surr_key,nbr_hh_in_zip,nbr_nr_adults_in_hh,hh_pop,advtg_len_rsdnc_cd,advtg_home_ownr_cd,dsf_season_cd,advtg_hh_edu_cd,advtg_hh_occupn_cd,advtg_child_presnc_cd,advtg_hh_age_cd,zip_avg_age,zip_mdn_age,mail_rspns_buy_cd,cnt_gend_cd_1,cnt_gend_cd_2,cnt_gend_cd_3,cnt_gend_cd_unk,cnt_advtg_marital_stat_cd_1,cnt_advtg_marital_stat_cd_2,cnt_advtg_marital_stat_cd_unk,cnt_nbr_tradeline_0,cnt_nbr_tradeline_1,cnt_nbr_tradeline_2,cnt_nbr_tradeline_3,cnt_nbr_tradeline_4,cnt_nbr_tradeline_5,cnt_nbr_tradeline_6,cnt_nbr_tradeline_7,cnt_nbr_tradeline_8,cnt_nbr_tradeline_9,cnt_nbr_tradeline_unk,advtg_dwell_type_cd,prprty_mkt_val_cd,zip_avg_prprty_mkt_val,zip_mdn_prprty_mkt_val,zip_avg_home_eqty_amt,zip_mdn_home_eqty_amt,trgt_inc_cd,zip_avg_trgt_inc_narrow_band,zip_mdn_trgt_inc_narrow_band,zip_avg_inc_prodc_asset_cd,zip_mdn_inc_prodc_asset_cd,zip_avg_net_wrth_cd,zip_mdn_net_wrth_cd,rylty_trgt_mktg_val_scr_cd,analytic_row_number2
                columns.comments 
                columns.types string:string:bigint:bigint:bigint:tinyint:string:string:string:string:string:string:string:double:double:int:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:string:string:double:double:double:double:string:double:double:double:double:double:double:string:int
                field.delim ,
                file.inputformat org.apache.hadoop.mapred.TextInputFormat
                file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                location file:/Users/axu/Documents/localDB/23982_debug
                name default.hh_agg
                numFiles 0
                numRows 0
                rawDataSize 0
                serialization.ddl struct hh_agg { string st_fips_cd, string zip_cd_5, i64
hh_surr_key, i64 nbr_hh_in_zip, i64 nbr_nr_adults_in_hh, byte hh_pop, string advtg_len_rsdnc_cd,
string advtg_home_ownr_cd, string dsf_season_cd, string advtg_hh_edu_cd, string advtg_hh_occupn_cd,
string advtg_child_presnc_cd, string advtg_hh_age_cd, double zip_avg_age, double zip_mdn_age,
i32 mail_rspns_buy_cd, i64 cnt_gend_cd_1, i64 cnt_gend_cd_2, i64 cnt_gend_cd_3, i64 cnt_gend_cd_unk,
i64 cnt_advtg_marital_stat_cd_1, i64 cnt_advtg_marital_stat_cd_2, i64 cnt_advtg_marital_stat_cd_unk,
i64 cnt_nbr_tradeline_0, i64 cnt_nbr_tradeline_1, i64 cnt_nbr_tradeline_2, i64 cnt_nbr_tradeline_3,
i64 cnt_nbr_tradeline_4, i64 cnt_nbr_tradeline_5, i64 cnt_nbr_tradeline_6, i64 cnt_nbr_tradeline_7,
i64 cnt_nbr_tradeline_8, i64 cnt_nbr_tradeline_9, i64 cnt_nbr_tradeline_unk, string advtg_dwell_type_cd,
string prprty_mkt_val_cd, double zip_avg_prprty_mkt_val, double zip_mdn_prprty_mkt_val, double
zip_avg_home_eqty_amt, double zip_mdn_home_eqty_amt, string trgt_inc_cd, double zip_avg_trgt_inc_narrow_band,
double zip_mdn_trgt_inc_narrow_band, double zip_avg_inc_prodc_asset_cd, double zip_mdn_inc_prodc_asset_cd,
double zip_avg_net_wrth_cd, double zip_mdn_net_wrth_cd, string rylty_trgt_mktg_val_scr_cd,
i32 analytic_row_number2}
                serialization.format ,
                serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                totalSize 0
                transient_lastDdlTime 1419367415
              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              name: default.hh_agg
            name: default.hh_agg
      Truncated Path -> Alias:
        file:/Users/axu/Documents/localDB/23982_debug [t:hh_agg]
      Needs Tagging: false
      Reduce Operator Tree:
        Extract
          Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE Column stats:
NONE
          PTF Operator
            Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE Column stats:
NONE
            File Output Operator
              compressed: false
              GlobalTableId: 0
              directory: file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003
              NumFilesPerFileSink: 1
              table:
                  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                  properties:
                    columns _col0,_col1,_col2,_col6
                    columns.types string,string,bigint,string
                    escape.delim \
                    serialization.lib org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
                  serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
              TotalFiles: 1
              GatherStats: false
              MultiFileSpray: false

  Stage: Stage-2
    Map Reduce
      Map Operator Tree:
          TableScan
            GatherStats: false
            Reduce Output Operator
              key expressions: _col0 (type: string), _col1 (type: string), _col2 (type: bigint)
              sort order: +++
              Map-reduce partition columns: _col0 (type: string), _col1 (type: string)
              Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE Column
stats: NONE
              tag: -1
              value expressions: _col0 (type: string), _col1 (type: string), _col2 (type:
bigint)
      Path -> Alias:
        file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003
[file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003]
      Path -> Partition:
        file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003

          Partition
            base file name: -mr-10003
            input format: org.apache.hadoop.mapred.SequenceFileInputFormat
            output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
            properties:
              columns _col0,_col1,_col2,_col6
              columns.types string,string,bigint,string
              escape.delim \
              serialization.lib org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
            serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
          
              input format: org.apache.hadoop.mapred.SequenceFileInputFormat
              output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
              properties:
                columns _col0,_col1,_col2,_col6
                columns.types string,string,bigint,string
                escape.delim \
                serialization.lib org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
              serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe
      Truncated Path -> Alias:
        file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003
[file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10003]
      Needs Tagging: false
      Reduce Operator Tree:
        Extract
          Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE Column stats:
NONE
          PTF Operator
            Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE Column stats:
NONE
            Select Operator
              expressions: _col1 (type: string), _col2 (type: string), _col3 (type: bigint)
              outputColumnNames: _col0, _col1, _col2
              Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE Column
stats: NONE
              File Output Operator
                compressed: false
                GlobalTableId: 0
                directory: file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10000/.hive-staging_hive_2015-01-01_13-25-01_841_2717056365737730087-1/-ext-10001
                NumFilesPerFileSink: 1
                Statistics: Num rows: 16604 Data size: 5180603 Basic stats: COMPLETE Column
stats: NONE
                Stats Publishing Key Prefix: file:/var/folders/00/_7st2p5x5bg1hvpxlgm455180000gp/T/axu/hive_2015-01-01_13-25-01_841_2717056365737730087-1/-mr-10000/.hive-staging_hive_2015-01-01_13-25-01_841_2717056365737730087-1/-ext-10001/
                table:
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                    properties:
                      columns _col0,_col1,_col2
                      columns.types string:string:bigint
                      escape.delim \
                      hive.serialization.extend.nesting.levels true
                      serialization.format 1
                      serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                TotalFiles: 1
                GatherStats: false
                MultiFileSpray: false

  Stage: Stage-0
    Fetch Operator
      limit: -1

> Problem with subquery using windowing functions
> -----------------------------------------------
>
>                 Key: HIVE-9228
>                 URL: https://issues.apache.org/jira/browse/HIVE-9228
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.13.1
>            Reporter: Aihua Xu
>            Assignee: Aihua Xu
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> The following query with window functions failed. The internal query works fine.
> select st_fips_cd, zip_cd_5, hh_surr_key
> from
> (
> select st_fips_cd, zip_cd_5, hh_surr_key,
> count( case when advtg_len_rsdnc_cd = '1' then 1 end ) over (partition by st_fips_cd,
zip_cd_5) as CNT_ADVTG_LEN_RSDNC_CD_1,
> row_number() over (partition by st_fips_cd, zip_cd_5 order by hh_surr_key asc) as analytic_row_number3
> from hh_agg
> where analytic_row_number2 = 1
> ) t;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message