hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Sprague <sprag...@gmail.com>
Subject Re: s3a and hive
Date Tue, 15 Nov 2016 20:12:48 GMT
just for the record...

this config "hive.exec.stagingdir" determines that ".hive_staging"
sub-directory. when it defaults to the table path and the table path is in
s3 that's where i get the exception:

Failed with exception java.io.IOException: rename for src path:
s3a://trulia-dwr-cluster-dev/hive/omniture.db/hit_data_aws/
date_key=20161113/.hive- staging_hive_2016-11-15_04-57-
52_085_7825126612479617470-1/-ext-10000/000000_0 to dest
path:s3a://trulia-dwr-cluster-dev/hive/omniture.db/hit_data_aws/date_key=20161113/000000_0
returned false

i got the tip from this stackoverflow:

http://stackoverflow.com/questions/39547001/why-hive-staging-file-is-missing-in-aws-emr


that said setting "hive.metastore.warehouse.dir" to an S3 location is
something totally different and per Elliot's comments could be a risky
adventure - and unrelated to my error.

Anyway, I reset that back to hdfs and was inserting into an external table
located in s3 and *still* got that error above much to my consternation.
however, by playing with "hive.exec.stagingdir" (and reading that
stackoverflow) i was able to overcome the error.

YMMV.

Cheers,
Stephen.


On Tue, Nov 15, 2016 at 7:53 AM, Stephen Sprague <spragues@gmail.com> wrote:

> Thanks Elliot. I think you might be onto something there. :)
>
> Making that tiny little switch sure seemed attractive but judging from the
> Jira's out there the ramifications of that setting are far more involved
> and nuanced than i thought.
>
> awright. you make some convincing arguments there. looks like the smart
> money is on hdfs for the time-being.
>
> thanks for replying! Good stuff.
>
> Cheers,
> Stephen.
>
> On Tue, Nov 15, 2016 at 7:29 AM, Elliot West <teabot@gmail.com> wrote:
>
>> My gut feeling is that this is not something you should do (except for
>> fun!) I'm fairly confident that somewhere in Hive, MR, or Tez, you'll hit
>> some code that requires consistent, atomic move/copy/list/overwrite
>> semantics from the warehouse filesystem. This is not something that the
>> vanilla S3AFileSystem can provide. Even if you get to the point where
>> everything appears functionally sound, I expect you'll encounter unusual
>> and inconsistent behavior if you use this in the long term.
>>
>> Solutions to Hive on S3 include:
>>
>>    - Use S3Guard (not yet available): https://issues.apa
>>    che.org/jira/browse/HADOOP-13345
>>    <https://issues.apache.org/jira/browse/HADOOP-13345>
>>    - Use Hive on EMR with Amazon's S3 filesystem implementation and
>>    EMRFS. Note that this confusingly requires and overloads the 's3://' scheme.
>>
>> Hope this helps, and please report back with any findings as we are doing
>> quite a bit of Hive in AWS too.
>>
>> Cheers - Elliot.
>>
>> On 15 November 2016 at 15:19, Stephen Sprague <spragues@gmail.com> wrote:
>>
>>> no. permissions are good.  i believe the case to be that s3a does not
>>> have a "move" and/or "rename" semantic but i can't be the first one to
>>> encounter this. somebody out there has to have gone done this path way
>>> before me surely.
>>>
>>> searching the cyber i find this:
>>>
>>>    https://issues.apache.org/jira/browse/HIVE-14270
>>>
>>> which is part of a even more work with s3 (see the related jira's that
>>> that jira comes under) especially the Hadoop Uber-Jira.
>>>
>>>
>>> so after digging though those jira's lemme ask:
>>>
>>> has anyone set hive.metastore.warehouse.dir to a s3a location with
>>> success?
>>>
>>> seems to me hive 2.2.0 and perhaps hadoop 2.7 or 2.8 are the only
>>> chances of success but i'm happy to be told i'm wrong.
>>>
>>> thanks,
>>> Stephen.
>>>
>>>
>>>
>>> On Mon, Nov 14, 2016 at 10:25 PM, Jörn Franke <jornfranke@gmail.com>
>>> wrote:
>>>
>>>> Is it a permission issue on the folder?
>>>>
>>>> On 15 Nov 2016, at 06:28, Stephen Sprague <spragues@gmail.com> wrote:
>>>>
>>>> so i figured i try and set hive.metastore.warehouse.dir=s3a://bucket/hive
>>>> and see what would happen.
>>>>
>>>> running this query:
>>>>
>>>>     insert overwrite table omniture.hit_data_aws partition
>>>> (date_key=20161113) select * from staging.hit_data_aws_ext_20161113
>>>> limit 1;
>>>>
>>>> yields this error:
>>>>
>>>>    Failed with exception java.io.IOException: rename for src path:
>>>> s3a://trulia-dwr-cluster-dev/hive/omniture.db/hit_data_aws/date_key=20161113/.hive-
>>>> staging_hive_2016-11-15_04-57-52_085_7825126612479617470-1/-ext-10000/000000_0
>>>> to dest path:s3a://trulia-dwr-cluster-dev/hive/omniture.db/hit_data_aws/date_key=20161113/000000_0
>>>> returned false
>>>> FAILED: Execution Error, return code 1 from
>>>> org.apache.hadoop.hive.ql.exec.MoveTask. java.io.IOException: rename
>>>> for src path: s3a://trulia-dwr-cluster-dev/h
>>>> ive/omniture.db/hit_data_aws/date_key=20161113/.hive-staging
>>>> _hive_2016-11-15_04-57-52_085_7825126612479617470-1/-ext-10000/000000_0
>>>> to dest path:s3a://trulia-dwr-cluster-dev/hive/omniture.db/hit_data_aws/date_key=20161113/000000_0
>>>> returned false
>>>>
>>>>
>>>> is there any workaround?   i'm running hive 2.1.0 and hadoop version
>>>> 2.6.0-cdh5.7.1  .
>>>>
>>>>
>>>> thanks,
>>>> Stephen.
>>>>
>>>>
>>>
>>
>

Mime
View raw message