hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charles Menguy <>
Subject Issue uploading data to S3 with Hive
Date Fri, 28 Sep 2012 21:00:35 GMT
Hi everyone,

I'm using S3 regularly as a means of data storage and transfer, and I have
some Hive jobs who are running on data in HDFS but writing their output in
I'm doing this by doing an "insert overwrite directory

This works fine 99% of the time, but I see some cases where the job fails
during the upload even if the query itself is fine. This seems to have
something to do with temporary storage but I'm not sure what exactly:

set mapred.output.compress=true;
set hive.exec.compress.output=true;
insert overwrite directory 's3n://myaccesskey:mysecretkey@mybucket
group by

Hive history file=/tmp/keystone/hive_job_log_201209281712_1248082813.txt
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
Starting Job = job_201209181846_3513, Tracking URL =
Kill Command = /usr/lib/hadoop/bin/hadoop job
 -Dmapred.job.tracker=hdfs://myjobtracker -kill job_201209181846_3513
2012-09-28 17:12:21,714 Stage-1 map = 0%,  reduce = 0%
2012-09-28 17:12:25,785 Stage-1 map = 100%,  reduce = 0%
2012-09-28 17:12:34,026 Stage-1 map = 100%,  reduce = 17%
2012-09-28 17:12:37,087 Stage-1 map = 100%,  reduce = 76%
2012-09-28 17:12:40,153 Stage-1 map = 100%,  reduce = 98%
2012-09-28 17:12:42,317 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201209181846_3513
Job Commit failed with exception
does not exist in S3)'

This happens very rarely, but when it does the job just fails and doesn't
even retry and nothing is uploaded to S3. If I rerun the exact same query
after, most of the time it works fine.
It also doesn't seem to be related to the amount of data being uploaded,
I've seen it happen on very small queries like the one above, and sometimes
on the ones with a big amount of data.
This also doesn't seem to be related to using gzip compression or not, i've
seen it happen with and without compression.
>From what I can see this seems to be related to S3 specifically, but I'm
not sure why as it seems pretty random.

If I look in the jobtracker, the job looks fine and is marked as
successful, so this happens after the job has completed, so I don't see any
error in the logs anywhere than the above.

Is there anything I could do to avoid this rare problem?


View raw message