hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Ball (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-15581) Unable to use advanced aggregation with multiple inserts clause
Date Mon, 16 Jan 2017 01:31:26 GMT

    [ https://issues.apache.org/jira/browse/HIVE-15581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15823343#comment-15823343
] 

James Ball commented on HIVE-15581:
-----------------------------------

[~cartershanklin]
I tried Tez and the query was executed correctly, so it may well be a problem specific to
MapReduce.
Thank you for the suggestion.

Results w/ Tez:
{noformat}
	table3.column1	table3.column2	table3.column3	table3.partition1
1	NULL	NULL	3	value1
2	NULL	value1	1	value1
3	NULL	value2	1	value1
4	NULL	value3	1	value1
5	value1	NULL	1	value1
6	value1	value1	1	value1
7	value2	NULL	1	value1
8	value2	value2	1	value1
9	value3	NULL	1	value1
10	value3	value3	1	value1
11	NULL	NULL	3	value2
12	NULL	value1	1	value2
13	NULL	value2	1	value2
14	NULL	value3	1	value2
15	value1	NULL	1	value2
16	value1	value1	1	value2
17	value2	NULL	1	value2
18	value2	value2	1	value2
19	value3	NULL	1	value2
20	value3	value3	1	value2
{noformat}

> Unable to use advanced aggregation with multiple inserts clause
> ---------------------------------------------------------------
>
>                 Key: HIVE-15581
>                 URL: https://issues.apache.org/jira/browse/HIVE-15581
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 1.2.1
>            Reporter: James Ball
>              Labels: newbie
>
> ■Use Cases
> - Use multiple insert clauses within a single query to insert multiple static (user-defined)
partitions into a single table.
> - Use advanced aggregation (cube) features within each insert clause to include subtotals
of columns for each partition
> ■Expected Behaviour
> - Subtotals are inserted for all combinations of the set of columns
> ■Observed Behaviour
> - No subtotals are not inserted for any combination of the set of columns
> ■Sample Queries
> {code:sql}
> // Create test tables
> create table if not exists
> 	table1
> 	(
> 		column1 string,
> 		column2 string,
> 		column3 int
> 		)
> 	stored as orc
> 	tblproperties
> 	(
> 		"orc.compress" = "SNAPPY"
> 		);
> create table if not exists
> 	table2
> 	(
> 		column1 string,
> 		column2 string,
> 		column3 int
> 		)
> 	partitioned by
> 	(
> 		partition1 string
> 		)
> 	stored as orc
> 	tblproperties
> 	(
> 		"orc.compress" = "SNAPPY"
> 		);
> create table if not exists
> 	table3
> 	(
> 		column1 string,
> 		column2 string,
> 		column3 int
> 		)
> 	partitioned by
> 	(
> 		partition1 string
> 		)
> 	stored as orc
> 	tblproperties
> 	(
> 		"orc.compress" = "SNAPPY"
> 		);
> {code}
> {code:sql}
> // Insert test values
> insert overwrite table
> 	table1
> 	values
> 		('value1', 'value1', 1),
> 		('value2', 'value2', 1),
> 		('value3', 'value3', 1);
> {code}
> {code:sql}
> // Single insert clause with multiple inserts syntax
> // Subtotals are inserted into target table
> from
> 	table1
> insert overwrite table
> 	table2
> 	partition
> 	(
> 		partition1 = 'value1'
> 		)
> 	select
> 		column1,
> 		column2,
> 		sum(column3) as column3
> 	group by
> 		column1,
> 		column2
> 	with cube;
> {code}
> {code:sql}
> // Multiple insert clauses with multiple inserts syntax
> // Subtotals are not inserted into target table
> from
> 	table1
> insert overwrite table
> 	table3
> 	partition
> 	(
> 		partition1 = 'value1'
> 		)
> 	select
> 		column1,
> 		column2,
> 		sum(column3) as column3
> 	group by
> 		column1,
> 		column2
> 	with cube
> insert overwrite table
> 	table3
> 	partition
> 	(
> 		partition1 = 'value2'
> 		)
> 	select
> 		column1,
> 		column2,
> 		sum(column3) as column3
> 	group by
> 		column1,
> 		column2
> 	with cube;
> {code}
> ■Executions Plans
> - Single insert clause with multiple inserts syntax
> {noformat}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
>   Stage-2 depends on stages: Stage-0
> STAGE PLANS:
>   Stage: Stage-1
>     Map Reduce
>       Map Operator Tree:
>           TableScan
>             alias: table1
>             Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats:
NONE
>             Select Operator
>               expressions: column1 (type: string), column2 (type: string), column3 (type:
int)
>               outputColumnNames: column1, column2, column3
>               Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats:
NONE
>               Group By Operator
>                 aggregations: sum(column3)
>                 keys: column1 (type: string), column2 (type: string), '0' (type: string)
>                 mode: hash
>                 outputColumnNames: _col0, _col1, _col2, _col3
>                 Statistics: Num rows: 12 Data size: 2208 Basic stats: COMPLETE Column
stats: NONE
>                 Reduce Output Operator
>                   key expressions: _col0 (type: string), _col1 (type: string), _col2
(type: string)
>                   sort order: +++
>                   Map-reduce partition columns: _col0 (type: string), _col1 (type: string),
_col2 (type: string)
>                   Statistics: Num rows: 12 Data size: 2208 Basic stats: COMPLETE Column
stats: NONE
>                   value expressions: _col3 (type: bigint)
>       Reduce Operator Tree:
>         Group By Operator
>           aggregations: sum(VALUE._col0)
>           keys: KEY._col0 (type: string), KEY._col1 (type: string), KEY._col2 (type:
string)
>           mode: mergepartial
>           outputColumnNames: _col0, _col1, _col3
>           Statistics: Num rows: 6 Data size: 1104 Basic stats: COMPLETE Column stats:
NONE
>           pruneGroupingSetId: true
>           Select Operator
>             expressions: _col0 (type: string), _col1 (type: string), UDFToInteger(_col3)
(type: int)
>             outputColumnNames: _col0, _col1, _col2
>             Statistics: Num rows: 6 Data size: 1104 Basic stats: COMPLETE Column stats:
NONE
>             File Output Operator
>               compressed: false
>               Statistics: Num rows: 6 Data size: 1104 Basic stats: COMPLETE Column stats:
NONE
>               table:
>                   input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                   output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>                   serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
>                   name: zzz_james_ball.table2
>   Stage: Stage-0
>     Move Operator
>       tables:
>           partition:
>             partition1 value1
>           replace: true
>           table:
>               input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>               output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>               serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
>               name: zzz_james_ball.table2
>   Stage: Stage-2
>     Stats-Aggr Operator
> {noformat}
> - Single insert clause with multiple inserts syntax
> {noformat}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-0 depends on stages: Stage-2
>   Stage-3 depends on stages: Stage-0
>   Stage-1 depends on stages: Stage-2
>   Stage-4 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
>     Map Reduce
>       Map Operator Tree:
>           TableScan
>             alias: table1
>             Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats:
NONE
>             Select Operator
>               expressions: column1 (type: string), column2 (type: string), column3 (type:
int)
>               outputColumnNames: column1, column2, column3
>               Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats:
NONE
>               Reduce Output Operator
>                 key expressions: column1 (type: string), column2 (type: string)
>                 sort order: ++
>                 Map-reduce partition columns: column1 (type: string), column2 (type:
string)
>                 Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats:
NONE
>                 value expressions: column3 (type: int)
>       Reduce Operator Tree:
>         Forward
>           Statistics: Num rows: 3 Data size: 552 Basic stats: COMPLETE Column stats:
NONE
>           Group By Operator
>             aggregations: sum(VALUE._col0)
>             keys: KEY._col0 (type: string), KEY._col1 (type: string)
>             mode: complete
>             outputColumnNames: _col0, _col1, _col2
>             Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats:
NONE
>             Select Operator
>               expressions: _col0 (type: string), _col1 (type: string), UDFToInteger(_col2)
(type: int)
>               outputColumnNames: _col0, _col1, _col2
>               Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats:
NONE
>               File Output Operator
>                 compressed: false
>                 Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats:
NONE
>                 table:
>                     input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                     output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>                     serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
>                     name: zzz_james_ball.table3
>           Group By Operator
>             aggregations: sum(VALUE._col0)
>             keys: KEY._col0 (type: string), KEY._col1 (type: string)
>             mode: complete
>             outputColumnNames: _col0, _col1, _col2
>             Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats:
NONE
>             Select Operator
>               expressions: _col0 (type: string), _col1 (type: string), UDFToInteger(_col2)
(type: int)
>               outputColumnNames: _col0, _col1, _col2
>               Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats:
NONE
>               File Output Operator
>                 compressed: false
>                 Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats:
NONE
>                 table:
>                     input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>                     output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>                     serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
>                     name: zzz_james_ball.table3
>   Stage: Stage-0
>     Move Operator
>       tables:
>           partition:
>             partition1 value1
>           replace: true
>           table:
>               input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>               output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>               serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
>               name: zzz_james_ball.table3
>   Stage: Stage-3
>     Stats-Aggr Operator
>   Stage: Stage-1
>     Move Operator
>       tables:
>           partition:
>             partition1 value2
>           replace: true
>           table:
>               input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
>               output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
>               serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
>               name: zzz_james_ball.table3
>   Stage: Stage-4
>     Stats-Aggr Operator
> {noformat}
> ■Notes
> - This problem occurs with all advanced aggregation features (cube, grouping sets, rollup)
> - This problem occurs whether hive.map.aggr is set to true or false
> - Dynamic partitions are not used because the partition values are set manually within
the where conditions of each insert clause



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message