hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Huebsch <ryan-h...@huebsch.org>
Subject Re: Dynamic partition set to null
Date Sun, 13 Feb 2011 16:19:42 GMT
You are likely encountering a bug w/ Amazon's S3 code:
https://forums.aws.amazon.com/thread.jspa?threadID=56358&tstart=25

Try inserting into a non-S3 backed table to see if this is indeed your 
problem.

Based on the Amazon forums they are expected a fix this week:
https://forums.aws.amazon.com/thread.jspa?threadID=60149&tstart=0

Ryan

On 02/12/2011 11:08 PM, khassounah@mediumware.net wrote:
> Hello,
>
> I have the following table definition (simplified to help in debugging):
>
>      create external table pvs (
>        time INT,
>        server STRING,
>        thread_id STRING
>      )
>      partitioned by (
>        dt string
>      )
>      row format delimited fields terminated by '\t'
>      stored as textfile
>      location 's3://dev-elastic/logz/';
>
> I have another table raw_pvs that I want to import data from into pvs
> using the following statement:
>
>      INSERT OVERWRITE TABLE pvs PARTITION (dt)
>      SELECT s.time, s.server, s.thread_id, s.dt
>        FROM (
>          FROM raw_pvs SELECT raw_pvs.time, raw_pvs.server,
> raw_pvs.thread_id, raw_pvs.dt where dt>'2011_01_00' and
> dt<'2011_01_02' limit 100
>        ) s;
>
> I keep getting the error
>
>      Total MapReduce jobs = 1
>      Launching Job 1 out of 1
>      Number of reduce tasks determined at compile time: 1
>      In order to change the average load for a reducer (in bytes):
>        set hive.exec.reducers.bytes.per.reducer=<number>
>      In order to limit the maximum number of reducers:
>        set hive.exec.reducers.max=<number>
>      In order to set a constant number of reducers:
>        set mapred.reduce.tasks=<number>
>      Starting Job = job_201102111900_0003, Tracking URL =
> http://ip-10-204-190-203.ec2.internal:9100/jobdetails.jsp?jobid=job_201102111900_0003
>      Kill Command = /home/hadoop/.versions/0.20/bin/../bin/hadoop job
> -Dmapred.job.tracker=ip-10-204-190-203.ec2.internal:9001 -kill
> job_201102111900_0003
>      2011-02-12 01:11:07,649 Stage-1 map = 0%,  reduce = 0%
>      2011-02-12 01:11:09,733 Stage-1 map = 20%,  reduce = 0%
>      2011-02-12 01:11:12,785 Stage-1 map = 100%,  reduce = 0%
>      2011-02-12 01:11:18,868 Stage-1 map = 100%,  reduce = 100%
>      Ended Job = job_201102111900_0003
>      Loading data to table pvs partition (dt=null)
>      Failed with exception dt not found in table's partition spec: {dt=null}
>      FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.MoveTask
>
> When I run the sub query directly (FROM raw_pvs SELECT raw_pvs.time,
> raw_pvs.server, raw_pvs.thread_id, raw_pvs.dt where dt>'2011_01_00' and
> dt<'2011_01_02' limit 100) I get 100 rows with no null in them, so this
> doesn't seem like a data issue.
>
> Does anyone know what I'm doing wrong? I've been stuck on this for few days!!
>
> thank you
> Khaled
>


Mime
View raw message