hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Koifman (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-7817) distinct/group by don't work on partition columns
Date Thu, 21 Aug 2014 00:31:35 GMT

     [ https://issues.apache.org/jira/browse/HIVE-7817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eugene Koifman updated HIVE-7817:
---------------------------------

    Description: 
suppose you have a table like this:
{code:sql}
CREATE TABLE page_view(
       viewTime INT,
       userid BIGINT,
        page_url STRING,
        referrer_url STRING,
        ip STRING COMMENT 'IP Address of the User')
COMMENT 'This is the page view table'
PARTITIONED BY(dt STRING, country STRING)
CLUSTERED BY(userid) INTO 4 BUCKETS
{code}

Then 
{code:sql}
select distinct dt from page_view;
select distinct dt, country from page_view;
select dt, country from page_view group by dt, country;
{code}

all fail with

{noformat}
Query ID = ekoifman_20140820172626_b03ba819-c111-433f-a3fc-453c7d5a3e86
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2014-08-20 17:26:13,018 Stage-1 map = 0%,  reduce = 0%
Ended Job = job_local165359429_0013 with errors
Error during job, obtaining debugging information...
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
{noformat}

but 
{code:sql}
select dt, country, count(*) from page_view group by dt, country;
{code}

works fine.

  was:
suppose you have a table like this:
{code:sql}
CREATE TABLE page_view(
       viewTime INT,
       userid BIGINT,
        page_url STRING,
        referrer_url STRING,
        ip STRING COMMENT 'IP Address of the User')
COMMENT 'This is the page view table'
PARTITIONED BY(dt STRING, country STRING)
CLUSTERED BY(userid) INTO 4 BUCKETS
{code:sql}

Then 
{code:sql}
select distinct dt from page_view;
select distinct dt, country from page_view;
select dt, country from page_view group by dt, country;
{code:sql}

all fail with

{noformat}
Query ID = ekoifman_20140820172626_b03ba819-c111-433f-a3fc-453c7d5a3e86
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2014-08-20 17:26:13,018 Stage-1 map = 0%,  reduce = 0%
Ended Job = job_local165359429_0013 with errors
Error during job, obtaining debugging information...
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
{noformat}

but 
{code:sql}
select dt, country, count(*) from page_view group by dt, country;
{code:sql}

works fine.


> distinct/group by don't work on partition columns
> -------------------------------------------------
>
>                 Key: HIVE-7817
>                 URL: https://issues.apache.org/jira/browse/HIVE-7817
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.14.0
>            Reporter: Eugene Koifman
>
> suppose you have a table like this:
> {code:sql}
> CREATE TABLE page_view(
>        viewTime INT,
>        userid BIGINT,
>         page_url STRING,
>         referrer_url STRING,
>         ip STRING COMMENT 'IP Address of the User')
> COMMENT 'This is the page view table'
> PARTITIONED BY(dt STRING, country STRING)
> CLUSTERED BY(userid) INTO 4 BUCKETS
> {code}
> Then 
> {code:sql}
> select distinct dt from page_view;
> select distinct dt, country from page_view;
> select dt, country from page_view group by dt, country;
> {code}
> all fail with
> {noformat}
> Query ID = ekoifman_20140820172626_b03ba819-c111-433f-a3fc-453c7d5a3e86
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks not specified. Estimated from input data size: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=<number>
> Job running in-process (local Hadoop)
> Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
> 2014-08-20 17:26:13,018 Stage-1 map = 0%,  reduce = 0%
> Ended Job = job_local165359429_0013 with errors
> Error during job, obtaining debugging information...
> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> MapReduce Jobs Launched: 
> Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 FAIL
> Total MapReduce CPU Time Spent: 0 msec
> {noformat}
> but 
> {code:sql}
> select dt, country, count(*) from page_view group by dt, country;
> {code}
> works fine.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message