hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-8207) Add .q tests for multi-table insertion [Spark Branch]
Date Tue, 23 Sep 2014 17:36:34 GMT

     [ https://issues.apache.org/jira/browse/HIVE-8207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chao updated HIVE-8207:
-----------------------
    Description: 
Now that multi-table insertion is committed to branch, we should enable those related qtests.

Here is a list of qfiles that should be activated (some of them may already be activated).
The list may not be comprehensive.

{noformat}
add_part_multiple.q
auto_smb_mapjoin_14.q
bucket5.q
column_access_stats.q
date_udf.q
groupby10.q
groupby11.q
groupby3_map_multi_distinct.q
groupby3_map.q
groupby3_map_skew.q
groupby3_noskew_multi_distinct.q
groupby3_noskew.q
groupby7_map_multi_single_reducer.q
groupby7_map.q
groupby7_map_skew.q
groupby7_noskew_multi_single_reducer.q
groupby7_noskew.q
groupby7.q
groupby8_map.q
groupby8_map_skew.q
groupby8_noskew.q
groupby8.q
groupby9.q
groupby_complex_types_multi_single_reducer.q
groupby_complex_types.q
groupby_cube1.q
groupby_map_ppr_multi_distinct.q
groupby_map_ppr.q
groupby_multi_insert_common_distinct.q
groupby_multi_single_reducer2.q
groupby_multi_single_reducer3.q
groupby_multi_single_reducer.q
groupby_position.q
groupby_ppr.q
groupby_rollup1.q
groupby_sort_1_23.q
groupby_sort_1.q
groupby_sort_skew_1_23.q
infer_bucket_sort_multi_insert.q
innerjoin.q
input12_hadoop20.q
input12.q
input13.q
input14.q
input17.q
input18.q
input1_limit.q
input_part2.q
insert_into3.q
join_nullsafe.q
load_dyn_part8.q
metadata_only_queries_with_filters.q
multigroupby_singlemr.q
multi_insert_gby2.q
multi_insert_gby3.q
multi_insert_gby.q
multi_insert_lateral_view.qmulti_insert_move_tasks_share_dependencies.q
multi_insert.q
parallel.q
partition_date2.q
pcr.q
ppd_multi_insert.q
ppd_transform.q
smb_mapjoin_11.q
smb_mapjoin_12.q
smb_mapjoin_13.q
smb_mapjoin_15.q
smb_mapjoin_16.q
stats4.q
subquery_multiinsert.q
table_access_keys_stats.q
tez_dml.q
udaf_percentile_approx_20.q
udaf_percentile_approx_23.q
union17.q
union18.q
union19.q
{noformat}                                                                              

There are some tests that cannot be enabled right now, due to various reasons:

1. ForwardOperator Issue, including
{noformat}
groupby7_noskew_multi_single_reducer.q
groupby8_map.q
groupby8_map_skew.q
groupby8_noskew.q
groupby8.q
groupby9.q
groupby10.q
groupby_complex_types_multi_single_reducer.q
groupby_multi_insert_common_distinct.q 
union17.q
{noformat}

*Reason*: currently, if the node to break in the operator tree is a ForwardOperator, we simple
do nothing. However, we may have the following case:

{noformat}
    ......  FOR -> RS_0 -> RS_1
                          \-> RS_2
{noformat}

Here, {{RS_0}} leads to both {{RS_1}} and {{RS_2}}, and because of the issue in HIVE-7731
and HIVE-8118, both downstream branches will get duplicated results.

2. Stats issue, including:
{noformat}
bucket5.q
infer_bucket_sort_multi_insert.q
stats4.q
smb_mapjoin_13.q
smb_mapjoin_15.q
{noformat}

*Reason*: In these tests, I get diff error because {{numRows}} and {{rawDataSize}} are -1,
but they are expected to be some positive value. I don't think this is related to multi-insertion.

3. Join/SMB Join Issue, including
{noformat}
auto_smb_mapjoin_14.q
auto_sortmerge_join_13.q
smb_mapjoin_11.q
smb_mapjoin_12.q
smb_mapjoin_13.q
smb_mapjoin_15.q
smb_mapjoin_16.q
{noformat}

*Reason*: These tests either failed with exception or failed with diff. I think it's because
SMB Join (HIVE-8202) isn't supported right now.

4. Result doesn't match, including
{noformat}
groupby3_map_skew.q
groupby_map_ppr_multi_distinct.q
groupby_map_ppr.q
partition_date2.q
udaf_percentile_approx_23.q
{noformat}

*Reason*: The results from these tests are different from MR's. For instance, test for groupby3_map_skew.q
failed because:

{noformat}
< 130091.0      260.182 256.10355987055016      98.0    0.0     142.92680950752379    
 143.06995106518903      20428.07288     20469.0109
---
> 130091.0      260.182 256.10355987055016      98.0    0.0     142.9268095075238     
 143.06995106518906      20428.07288     20469.0109
{noformat}
I don't know why this will happen. But, I think they may not be related to multi-insertion.


  was:
Now that multi-table insertion is committed to branch, we should enable those related qtests.

Here is a list of qfiles that should be activated (some of them may already be activated).
The list may not be comprehensive.

{noformat}
add_part_multiple.q
auto_smb_mapjoin_14.q
bucket5.q
column_access_stats.q
date_udf.q
groupby10.q
groupby11.q
groupby3_map_multi_distinct.q
groupby3_map.q
groupby3_map_skew.q
groupby3_noskew_multi_distinct.q
groupby3_noskew.q
groupby7_map_multi_single_reducer.q
groupby7_map.q
groupby7_map_skew.q
groupby7_noskew_multi_single_reducer.q
groupby7_noskew.q
groupby7.q
groupby8_map.q
groupby8_map_skew.q
groupby8_noskew.q
groupby8.q
groupby9.q
groupby_complex_types_multi_single_reducer.q
groupby_complex_types.q
groupby_cube1.q
groupby_map_ppr_multi_distinct.q
groupby_map_ppr.q
groupby_multi_insert_common_distinct.q
groupby_multi_single_reducer2.q
groupby_multi_single_reducer3.q
groupby_multi_single_reducer.q
groupby_position.q
groupby_ppr.q
groupby_rollup1.q
groupby_sort_1_23.q
groupby_sort_1.q
groupby_sort_skew_1_23.q
infer_bucket_sort_multi_insert.q
innerjoin.q
input12_hadoop20.q
input12.q
input13.q
input14.q
input17.q
input18.q
input1_limit.q
input_part2.q
insert_into3.q
join_nullsafe.q
load_dyn_part8.q
metadata_only_queries_with_filters.q
multigroupby_singlemr.q
multi_insert_gby2.q
multi_insert_gby3.q
multi_insert_gby.q
multi_insert_lateral_view.qmulti_insert_move_tasks_share_dependencies.q
multi_insert.q
parallel.q
partition_date2.q
pcr.q
ppd_multi_insert.q
ppd_transform.q
smb_mapjoin_11.q
smb_mapjoin_12.q
smb_mapjoin_13.q
smb_mapjoin_15.q
smb_mapjoin_16.q
stats4.q
subquery_multiinsert.q
table_access_keys_stats.q
tez_dml.q
udaf_percentile_approx_20.q
udaf_percentile_approx_23.q
union17.q
union18.q
union19.q
{noformat}                                                                              

There are some tests that cannot be enabled right now, due to various reasons:

1. ForwardOperator Issue, including
{noformat}
groupby7_noskew_multi_single_reducer.q
groupby8_map.q
groupby8_map_skew.q
groupby8_noskew.q
groupby8.q
groupby9.q
groupby10.q
groupby_complex_types_multi_single_reducer.q
groupby_multi_insert_common_distinct.q 
union17.q
{noformat}

*Reason*: currently, if the node to break in the operator tree is a ForwardOperator, we simple
do nothing. However, we may have the following case:

{noformat}
    ......  FOR -> RS_0 -> RS_1
                              \-> RS_2
{noformat}

Here, {{RS_0}} leads to both {{RS_1}} and {{RS_2}}, and because of the issue in HIVE-7731
and HIVE-8118, both downstream branches will get duplicated results.

2. Stats issue, including:
{noformat}
bucket5.q
infer_bucket_sort_multi_insert.q
stats4.q
smb_mapjoin_13.q
smb_mapjoin_15.q
{noformat}

*Reason*: In these tests, I get diff error because {{numRows}} and {{rawDataSize}} are -1,
but they are expected to be some positive value. I don't think this is related to multi-insertion.

3. Join/SMB Join Issue, including
{noformat}
auto_smb_mapjoin_14.q
auto_sortmerge_join_13.q
smb_mapjoin_11.q
smb_mapjoin_12.q
smb_mapjoin_13.q
smb_mapjoin_15.q
smb_mapjoin_16.q
{noformat}

*Reason*: These tests either failed with exception or failed with diff. I think it's because
SMB Join (HIVE-8202) isn't supported right now.

4. Result doesn't match, including
{noformat}
groupby3_map_skew.q
groupby_map_ppr_multi_distinct.q
groupby_map_ppr.q
partition_date2.q
udaf_percentile_approx_23.q
{noformat}

*Reason*: The results from these tests are different from MR's. For instance, test for groupby3_map_skew.q
failed because:

{noformat}
< 130091.0      260.182 256.10355987055016      98.0    0.0     142.92680950752379    
 143.06995106518903      20428.07288     20469.0109
---
> 130091.0      260.182 256.10355987055016      98.0    0.0     142.9268095075238     
 143.06995106518906      20428.07288     20469.0109
{noformat}
I don't know why this will happen. But, I think they may not be related to multi-insertion.



> Add .q tests for multi-table insertion [Spark Branch]
> -----------------------------------------------------
>
>                 Key: HIVE-8207
>                 URL: https://issues.apache.org/jira/browse/HIVE-8207
>             Project: Hive
>          Issue Type: Test
>          Components: Spark
>            Reporter: Chao
>            Assignee: Chao
>         Attachments: HIVE-8207.1-spark.patch
>
>
> Now that multi-table insertion is committed to branch, we should enable those related
qtests.
> Here is a list of qfiles that should be activated (some of them may already be activated).
> The list may not be comprehensive.
> {noformat}
> add_part_multiple.q
> auto_smb_mapjoin_14.q
> bucket5.q
> column_access_stats.q
> date_udf.q
> groupby10.q
> groupby11.q
> groupby3_map_multi_distinct.q
> groupby3_map.q
> groupby3_map_skew.q
> groupby3_noskew_multi_distinct.q
> groupby3_noskew.q
> groupby7_map_multi_single_reducer.q
> groupby7_map.q
> groupby7_map_skew.q
> groupby7_noskew_multi_single_reducer.q
> groupby7_noskew.q
> groupby7.q
> groupby8_map.q
> groupby8_map_skew.q
> groupby8_noskew.q
> groupby8.q
> groupby9.q
> groupby_complex_types_multi_single_reducer.q
> groupby_complex_types.q
> groupby_cube1.q
> groupby_map_ppr_multi_distinct.q
> groupby_map_ppr.q
> groupby_multi_insert_common_distinct.q
> groupby_multi_single_reducer2.q
> groupby_multi_single_reducer3.q
> groupby_multi_single_reducer.q
> groupby_position.q
> groupby_ppr.q
> groupby_rollup1.q
> groupby_sort_1_23.q
> groupby_sort_1.q
> groupby_sort_skew_1_23.q
> infer_bucket_sort_multi_insert.q
> innerjoin.q
> input12_hadoop20.q
> input12.q
> input13.q
> input14.q
> input17.q
> input18.q
> input1_limit.q
> input_part2.q
> insert_into3.q
> join_nullsafe.q
> load_dyn_part8.q
> metadata_only_queries_with_filters.q
> multigroupby_singlemr.q
> multi_insert_gby2.q
> multi_insert_gby3.q
> multi_insert_gby.q
> multi_insert_lateral_view.qmulti_insert_move_tasks_share_dependencies.q
> multi_insert.q
> parallel.q
> partition_date2.q
> pcr.q
> ppd_multi_insert.q
> ppd_transform.q
> smb_mapjoin_11.q
> smb_mapjoin_12.q
> smb_mapjoin_13.q
> smb_mapjoin_15.q
> smb_mapjoin_16.q
> stats4.q
> subquery_multiinsert.q
> table_access_keys_stats.q
> tez_dml.q
> udaf_percentile_approx_20.q
> udaf_percentile_approx_23.q
> union17.q
> union18.q
> union19.q
> {noformat}                                                                          
   
> There are some tests that cannot be enabled right now, due to various reasons:
> 1. ForwardOperator Issue, including
> {noformat}
> groupby7_noskew_multi_single_reducer.q
> groupby8_map.q
> groupby8_map_skew.q
> groupby8_noskew.q
> groupby8.q
> groupby9.q
> groupby10.q
> groupby_complex_types_multi_single_reducer.q
> groupby_multi_insert_common_distinct.q 
> union17.q
> {noformat}
> *Reason*: currently, if the node to break in the operator tree is a ForwardOperator,
we simple do nothing. However, we may have the following case:
> {noformat}
>     ......  FOR -> RS_0 -> RS_1
>                           \-> RS_2
> {noformat}
> Here, {{RS_0}} leads to both {{RS_1}} and {{RS_2}}, and because of the issue in HIVE-7731
and HIVE-8118, both downstream branches will get duplicated results.
> 2. Stats issue, including:
> {noformat}
> bucket5.q
> infer_bucket_sort_multi_insert.q
> stats4.q
> smb_mapjoin_13.q
> smb_mapjoin_15.q
> {noformat}
> *Reason*: In these tests, I get diff error because {{numRows}} and {{rawDataSize}} are
-1, but they are expected to be some positive value. I don't think this is related to multi-insertion.
> 3. Join/SMB Join Issue, including
> {noformat}
> auto_smb_mapjoin_14.q
> auto_sortmerge_join_13.q
> smb_mapjoin_11.q
> smb_mapjoin_12.q
> smb_mapjoin_13.q
> smb_mapjoin_15.q
> smb_mapjoin_16.q
> {noformat}
> *Reason*: These tests either failed with exception or failed with diff. I think it's
because SMB Join (HIVE-8202) isn't supported right now.
> 4. Result doesn't match, including
> {noformat}
> groupby3_map_skew.q
> groupby_map_ppr_multi_distinct.q
> groupby_map_ppr.q
> partition_date2.q
> udaf_percentile_approx_23.q
> {noformat}
> *Reason*: The results from these tests are different from MR's. For instance, test for
groupby3_map_skew.q failed because:
> {noformat}
> < 130091.0      260.182 256.10355987055016      98.0    0.0     142.92680950752379
     143.06995106518903      20428.07288     20469.0109
> ---
> > 130091.0      260.182 256.10355987055016      98.0    0.0     142.9268095075238
      143.06995106518906      20428.07288     20469.0109
> {noformat}
> I don't know why this will happen. But, I think they may not be related to multi-insertion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message