hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 董亚军 <ric.d...@liulishuo.com>
Subject Re: the `use database` command will change the scheme of target table?
Date Wed, 20 Jan 2016 09:27:26 GMT
hi all,

I read the wiki of section Temporary Folders:
https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration#AdminManualConfiguration-HiveMetastoreConfigurationVariables


My target table's filesystem is HDFS, but hive write the temporary data on
S3 after I use an S3 database.

Hive uses temporary folders both on the machine running the Hive client and
the default HDFS instance. These folders are used to store per-query
temporary/intermediate data sets and are normally cleaned up by the hive
client when the query is finished. However, in cases of abnormal hive
client termination, some data may be left behind. The configuration details
are as follows:

   - On the HDFS cluster this is set to */tmp/hive-<username>* by default
   and is controlled by the configuration variable *hive.exec.scratchdir*
   - On the client machine, this is hardcoded to */tmp/<username>*

Note that when *writing data to a table/partition, Hive will first write to
a temporary location on the target table's filesystem* (using
hive.exec.scratchdir as the temporary location) and then move the data to
the target table. This applies in all cases - whether tables are stored in
HDFS (normal case) or in file systems like S3 or even NFS.

On Wed, Jan 20, 2016 at 12:06 PM, 董亚军 <ric.dong@liulishuo.com> wrote:

> thanks Marcin,
>
> t1 created within the temp database right? which is point to HDFS.  so the
> output directory of m/r job should be in HDFS?
>
> my problem is why the output directory was host in s3 filesystem after I *use
> prd* database.
>
>
>
> On Wed, Jan 20, 2016 at 11:52 AM, Marcin Tustin <mtustin@handybook.com>
> wrote:
>
>> That is the expected behaviour. Managed tables are created within the
>> directory of their host database.
>>
>>
>> On Tuesday, 19 January 2016, 董亚军 <ric.dong@liulishuo.com> wrote:
>>
>>> hi list,
>>>
>>> we use the HDFS and S3 as the Hive Filesystem at the same time.   here
>>> has an issue:
>>>
>>>
>>> *scenario* 1:
>>>
>>> hive command:
>>>
>>> use default;
>>>
>>> create table temp.t1       // the database of temp which points to HDFS
>>> as
>>> select c1 from prd.t2;     // the database of prd and the table t2 are
>>> all points to S3
>>>
>>> it works well.
>>>
>>>
>>> *scenario* 2:
>>>
>>> hive command:
>>>
>>> *use prd; *
>>>
>>> create table temp.t1       // the database of temp which points to HDFS
>>> as
>>> select c1 from prd.t2;     // the database of prd and the table t2 are
>>> all point to S3
>>>
>>> the exception occurred with:
>>>
>>> Failed with exception Unable to move source
>>> s3a://warehouse-tmp/tmp/hive-ubuntu/hive_2016-01-20_xxxxxx/-ext-10001 to
>>> destination hdfs://hadoop-0/warehouse/temp.db/t1/
>>>
>>> and then, I try to change the Scratch space by the configuration key:
>>> hive.exec.scratchdir, and set the value to hdfs://hadoop-0/*tmp-foo*/...
>>> , but also failed with:
>>>
>>> Unable to move source s3a://warehouse-tmp*/tmp-foo* ... to
>>>
>>> it seems to the *use database* command change the scheme of the path
>>> for target table?
>>>
>>> hive version: 0.13.1
>>>
>>>
>>> thanks.
>>>
>>
>> Want to work at Handy? Check out our culture deck and open roles
>> <http://www.handy.com/careers>
>> Latest news <http://www.handy.com/press> at Handy
>> Handy just raised $50m
>> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
led
>> by Fidelity
>>
>>
>

Mime
View raw message