hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Elliot West <tea...@gmail.com>
Subject Re: Hive ExIm from on-premise HDP to Amazon EMR
Date Thu, 07 Jan 2016 16:53:59 GMT
More information: This works if I move the export into EMR's HDFS and then
import from there to a new location in HDFS. It does not work across
FileSystems:

   - Import from S3 → EMR HDFS (fails in a similar manner to S3 → S3)
   - Import from EMR HDFS → S3 (complains that HDFS FileSystem was expected
   as the destination. Presumably the same FileSystem instance is used for
   the source and destination).



On 7 January 2016 at 12:17, Elliot West <teabot@gmail.com> wrote:

> Hello,
>
> Following on from my earlier post concerning syncing Hive data from an on
> premise cluster to the cloud, I've been experimenting with the
> IMPORT/EXPORT functionality to move data from an on-premise HDP cluster to
> Amazon EMR. I started out with some simple Exports/Imports as these can be
> the core operations on which replication is founded. This worked fine with
> some on-premise clusters running HDP-2.2.4.
>
>
> // on cluster 1
>
> EXPORT TABLE my_table PARTITION (year_month='2015-12')
> TO '/exports/my_table'
> FOR REPLICATION ('1');
>
> // Copy from cluster1:/exports/my_table to cluster2:/staging/my_table
>
> // on cluster 2
>
> IMPORT FROM '/staging/my_table'
> LOCATION '/warehouse/my_table';
>
> // Table created, partition created, data relocated to
> /warehouse/my_table/year_month=2015-12
>
>
> I next tried similar with HDP-2.2.4 → EMR (4.2.0) like so:
>
> // On premise HDP2.2.4
> SET hiveconf:hive.exim.uri.scheme.whitelist=hdfs,pfile,s3n;
>
> EXPORT TABLE my_table PARTITION (year_month='2015-12')
> TO 's3n://API_KEY:SECRET_KEY@exports-bucket/my_table'
>
> // on EMR
> SET hiveconf:hive.exim.uri.scheme.whitelist=hdfs,pfile,s3n;
>
> IMPORT FROM 's3n://exports-bucket/my_table'
> LOCATION 's3n://hive-warehouse-bucket/my_table'
>
>
> The IMPORT behaviour I see is bizarre:
>
>    1. Creates the folder 's3n://hive-warehouse/my_table'
>    2. Copies the part file from
>    's3n://exports-bucket/my_table/year_month=2015-12' to
>    's3n://exports-bucket/my_table' (i.e. to the parent)
>    3. Fails with: "ERROR exec.Task: Failed with exception checkPaths:
>    s3n://exports-bucket/my_table has nested
>    directorys3n://exports-bucket/my_table/year_month=2015-12"
>
> It is as if it is attempting to set the final partition location to
> 's3n://exports-bucket/my_table' and not
> 's3n://hive-warehouse-bucket/my_table/year_month=2015-12' as happens with
> HDP → HDP.
>
> I've tried variations, specifying the partition on import, excluding the
> location, all with the same result. Any thoughts or assistance would be
> appreciated.
>
> Thanks - Elliot.
>
>
>

Mime
View raw message