hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "yangfang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-16666) Set hive.exec.stagingdir a relative directory or a sub directory of distination data directory will cause Hive to delete the intermediate query results
Date Mon, 22 May 2017 02:49:04 GMT

    [ https://issues.apache.org/jira/browse/HIVE-16666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16019097#comment-16019097
] 

yangfang commented on HIVE-16666:
---------------------------------

[~aihuaxu],[~pvary], thanks for your advice.
 In my opinion, the staging directory is just a temporary directory, users may not be concerned
with where the directory is, they only care about the final result. For users, any staging
directory name may be allowed, throw an exception may be a  little rough.
 Even if we add a validation against the configuration, for example suppose /tmp/hive/.hive-staging
is a valide directory because it's a empty directory that no one has used, but now, someone
may create table like this:
 create table test(a int, b string) location '/tmp'
Now the staging directory is a sub directory of  table data directory, this will still to
delete the intermediate query results in execution.
 Looking forward to your comments.

> Set hive.exec.stagingdir a relative directory or a sub directory of distination data
directory will cause Hive to delete the intermediate query results
> -------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-16666
>                 URL: https://issues.apache.org/jira/browse/HIVE-16666
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 3.0.0
>            Reporter: yangfang
>            Assignee: yangfang
>            Priority: Critical
>         Attachments: HIVE-16666.1.patch
>
>
> Set hive.exec.stagingdir=./*,  for example set hive.exec.stagingdir=./opq8.
> Then excute a query like this:
> insert overwrite table test2 select * from test3; 
> You will get the error like this:
> hive> set hive.exec.stagingdir=./opq8;
> hive> insert overwrite table test2 select * from test3;
> Query ID = mr_20170515134831_28ee392d-0d5a-4e47-b80c-dfcd31691b02
> Total jobs = 3
> Launching Job 1 out of 3
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_1494818119523_0008, Tracking URL = http://zdh77:8088/proxy/application_1494818119523_0008/
> Kill Command = /opt/ZDH/parcels/lib/hadoop/bin/hadoop job  -kill job_1494818119523_0008
> Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
> 2017-05-15 13:48:51,487 Stage-1 map = 0%,  reduce = 0%
> Ended Job = job_1494818119523_0008
> Stage-3 is selected by condition resolver.
> Stage-2 is filtered out by condition resolver.
> Stage-4 is filtered out by condition resolver.
> Moving data to directory hdfs://nameservice/hive/test2/opqt8_hive_2017-05-15_13-48-31_558_6151032330134038151-1/-ext-10000
> Loading data to table default.test2
> Moved: 'hdfs://nameservice/hive/test2/opqt8_hive_2017-05-15_13-48-31_558_6151032330134038151-1'
to trash at: hdfs://nameservice/user/mr/.Trash/Current
> Failed with exception Unable to move source hdfs://nameservice/hive/test2/opqt8_hive_2017-05-15_13-48-31_558_6151032330134038151-1/-ext-10000
to destination hdfs://nameservice/hive/test2
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask.
Unable to move source hdfs://nameservice/hive/test2/opqt8_hive_2017-05-15_13-48-31_558_6151032330134038151-1/-ext-10000
to destination hdfs://nameservice/hive/test2
> MapReduce Jobs Launched: 
> Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
> Total MapReduce CPU Time Spent: 0 msec
> hive>
> hive.exec.stagingdir=./opq8 is a relative path for destination write directory  /hive/test2.
 Hive will create a temporary directory /hive/test2/opq8_hive* for intermediate query results.
 Later in the move staging, Hive will delete or trash the sub directory under the /hive/test2
who's name does not begin with "_" or "."  in order to move data to this directory. You can
see its processing logic in org.apache.hadoop.hive.ql.metadata.trashFilesUnderDir.
> My modification method is: if  stagingdir is a sub directory of the destination write
directory. I add a "."   in front of stagingdir. now temporary directory will be /hive/test2/.opq8_hive*
, because the sub directory .opq8_hive* starts with ".",  Hive will not delete it.
> hive> set hive.exec.stagingdir=./opq8;
> hive>  insert overwrite table test2 select * from test3;
> Query ID = mr_20170515143940_ae48a65e-42be-4f50-b974-b713ca902867
> Total jobs = 3
> Launching Job 1 out of 3
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_1494818119523_0012, Tracking URL = http://zdh77:8088/proxy/application_1494818119523_0012/
> Kill Command = /opt/ZDH/parcels/lib/hadoop/bin/hadoop job  -kill job_1494818119523_0012
> Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
> 2017-05-15 14:40:04,547 Stage-1 map = 0%,  reduce = 0%
> Ended Job = job_1494818119523_0012
> Stage-3 is selected by condition resolver.
> Stage-2 is filtered out by condition resolver.
> Stage-4 is filtered out by condition resolver.
> Moving data to directory hdfs://nameservice/hive/test2/.opqt8_hive_2017-05-15_14-39-40_751_1221840798987515724-1/-ext-10000
> Loading data to table default.test2
> MapReduce Jobs Launched: 
> Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
> Total MapReduce CPU Time Spent: 0 msec
> OK
> Time taken: 26.751 seconds
> hive> 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message