hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harish Butani (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-4392) Illogical InvalidObjectException throwed when use mulit aggregate functions with star columns
Date Thu, 02 May 2013 04:42:16 GMT

    [ https://issues.apache.org/jira/browse/HIVE-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13647284#comment-13647284
] 

Harish Butani commented on HIVE-4392:
-------------------------------------

I agree [~navis] your call.

But couple of related observations, what are your thoughts on this.
If you do this query:
{noformat}
select *, max(p_size), min(p_size) from part
{noformat}
- The output has 4 columns. It will always be twice the number of aggregation expressions.

- If you do this
{noformat}
select *, max(p_size), min(p_size) from part group by p_name
{noformat}
- you get 5 columns.
- This is because the * applies to the schema of the GBy Op.
- In the case of the PTFs you don't get duplicates because the agg. expressions are marked
as VHC.
- But if you try either of these in Postgres
{noformat}
select *, max(p_size) from part
select *, max(p_size) from part group by p_name
{noformat}
- they fail; giving an error to the effect the part columns are not in Group By.
- So the '*' is resolved against the schema before the GBy 
- I think the behavior in Hive makes more sense; at some point should we change it so that
we don't get duplicate output columns for aggregations?
                
> Illogical InvalidObjectException throwed when use mulit aggregate functions with star
columns 
> ----------------------------------------------------------------------------------------------
>
>                 Key: HIVE-4392
>                 URL: https://issues.apache.org/jira/browse/HIVE-4392
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>         Environment: Apache Hadoop 0.20.1
> Apache Hive Trunk
>            Reporter: caofangkun
>            Assignee: Navis
>            Priority: Minor
>         Attachments: HIVE-4392.D10431.1.patch, HIVE-4392.D10431.2.patch, HIVE-4392.D10431.3.patch,
HIVE-4392.D10431.4.patch
>
>
> For Example:
> hive (default)> create table liza_1 as 
>               > select *, sum(key), sum(value) 
>               > from new_src;
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=<number>
> Starting Job = job_201304191025_0003, Tracking URL = http://hd17-vm5:51030/jobdetails.jsp?jobid=job_201304191025_0003
> Kill Command = /home/zongren/hadoop-current/bin/../bin/hadoop job  -kill job_201304191025_0003
> Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 1
> 2013-04-22 11:09:28,017 Stage-1 map = 0%,  reduce = 0%
> 2013-04-22 11:09:34,054 Stage-1 map = 0%,  reduce = 100%
> 2013-04-22 11:09:37,074 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_201304191025_0003
> Moving data to: hdfs://hd17-vm5:9101/user/zongren/hive/liza_1
> FAILED: Error in metadata: InvalidObjectException(message:liza_1 is not a valid object
name)
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
> MapReduce Jobs Launched: 
> Job 0: Reduce: 1   HDFS Read: 0 HDFS Write: 12 SUCCESS
> Total MapReduce CPU Time Spent: 0 msec
> hive (default)> create table liza_1 as         
>               > select *, sum(key), sum(value) 
>               > from new_src                   
>               > group by key, value;
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks not specified. Estimated from input data size: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=<number>
> Starting Job = job_201304191025_0004, Tracking URL = http://hd17-vm5:51030/jobdetails.jsp?jobid=job_201304191025_0004
> Kill Command = /home/zongren/hadoop-current/bin/../bin/hadoop job  -kill job_201304191025_0004
> Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 1
> 2013-04-22 11:11:58,945 Stage-1 map = 0%,  reduce = 0%
> 2013-04-22 11:12:01,964 Stage-1 map = 0%,  reduce = 100%
> 2013-04-22 11:12:04,982 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_201304191025_0004
> Moving data to: hdfs://hd17-vm5:9101/user/zongren/hive/liza_1
> FAILED: Error in metadata: InvalidObjectException(message:liza_1 is not a valid object
name)
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
> MapReduce Jobs Launched: 
> Job 0: Reduce: 1   HDFS Read: 0 HDFS Write: 0 SUCCESS
> Total MapReduce CPU Time Spent: 0 msec
> But the following tow Queries  work:
> hive (default)> create table liza_1 as select * from new_src;
> Total MapReduce jobs = 3
> Launching Job 1 out of 3
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_201304191025_0006, Tracking URL = http://hd17-vm5:51030/jobdetails.jsp?jobid=job_201304191025_0006
> Kill Command = /home/zongren/hadoop-current/bin/../bin/hadoop job  -kill job_201304191025_0006
> Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
> 2013-04-22 11:15:00,681 Stage-1 map = 0%,  reduce = 0%
> 2013-04-22 11:15:03,697 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_201304191025_0006
> Stage-4 is selected by condition resolver.
> Stage-3 is filtered out by condition resolver.
> Stage-5 is filtered out by condition resolver.
> Moving data to: hdfs://hd17-vm5:9101/user/zongren/hive-scratchdir/hive_2013-04-22_11-14-54_632_6709035018023861094/-ext-10001
> Moving data to: hdfs://hd17-vm5:9101/user/zongren/hive/liza_1
> Table default.liza_1 stats: [num_partitions: 0, num_files: 0, num_rows: 0, total_size:
0, raw_data_size: 0]
> MapReduce Jobs Launched: 
> Job 0:  HDFS Read: 0 HDFS Write: 0 SUCCESS
> Total MapReduce CPU Time Spent: 0 msec
> OK
> Time taken: 9.576 seconds
> hive (default)> create table liza_1 as
>               > select sum (key), sum(value) 
>               > from new_test;
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=<number>
> Starting Job = job_201304191025_0008, Tracking URL = http://hd17-vm5:51030/jobdetails.jsp?jobid=job_201304191025_0008
> Kill Command = /home/zongren/hadoop-current/bin/../bin/hadoop job  -kill job_201304191025_0008
> Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 1
> 2013-04-22 11:22:52,200 Stage-1 map = 0%,  reduce = 0%
> 2013-04-22 11:22:55,216 Stage-1 map = 0%,  reduce = 100%
> 2013-04-22 11:22:58,234 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_201304191025_0008
> Moving data to: hdfs://hd17-vm5:9101/user/zongren/hive/liza_1
> Table default.liza_1 stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size:
6, raw_data_size: 0]
> MapReduce Jobs Launched: 
> Job 0: Reduce: 1   HDFS Read: 0 HDFS Write: 6 SUCCESS
> Total MapReduce CPU Time Spent: 0 msec
> OK
> Time taken: 11.115 seconds

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message