hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chaoyu Tang <ctang...@gmail.com>
Subject Re: Review Request 43176: HIVE-12965: Insert overwrite local directory should perserve the overwritten directory permission
Date Thu, 04 Feb 2016 16:13:31 GMT


> On Feb. 4, 2016, 2:51 p.m., Xuefu Zhang wrote:
> >

Thanks Xuefu for review. I answered all the questions and please let me know if they make
sense.


> On Feb. 4, 2016, 2:51 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java, line 152
> > <https://reviews.apache.org/r/43176/diff/1/?file=1232087#file1232087line152>
> >
> >     Instead of doing this, should we explicitly check the existance of the destination?

Yeah, if the code is like following and first explicitly checks the existence of target, it
might be more readable. Instead I combined the exists and listStatus into one listStatus call
which can save one trip to HDFS in case that targetPath exists.

if (dstFs.exists(targetPath)) {
   FileStatus[] destFiles = dstFs.listStatus(targetPath);
   if (dstFs.isDirectory(targetPath)) {
     for (FileStatus destFile : destFiles) {
       if (!dstFs.delete(destFile.getPath(), true)) {
         throw new IOException("Unable to clean the destination directory: " + targetPath);
       }
     }
   }
} else {
   if (!FileUtils.mkdir(dstFs, targetPath, false, conf)) {
     throw new HiveException("Failed to create local target directory " + targetPath);
}


> On Feb. 4, 2016, 2:51 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java, line 143
> > <https://reviews.apache.org/r/43176/diff/1/?file=1232087#file1232087line143>
> >
> >     Should we check if it's a directory before calling listStatus() on it?

By first calling FileStatus[] destFiles = dstFs.listStatus(targetPath), we can tell if the
file with path targetPath exists or not. If yes, we then check if it is directory, otherwise
it throws FileNotFoundException and in its catch block we create a directory with targetPath.
(Please see explanation in next comment for the reason why it is implemented in this way).
But if we do dstFs.isDirectory(targetPath) first, for a non-existing directory, it returns
false instead of throwing FileNotFoundException and going to FileNotFoundException block to
create the target directory.
Yeah, I initally did in that way and found out that trick in the testing.


> On Feb. 4, 2016, 2:51 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java, line 137
> > <https://reviews.apache.org/r/43176/diff/1/?file=1232087#file1232087line137>
> >
> >     two nested try-catch block seem making the code hard to read.

Yeah, the outer try-catch
      } catch (IOException ioe) {
        throw new HiveException("Unable to move source " + sourcePath + " to destination "
            + targetPath, ioe);
      }
Is mainly for logging purpose. Should we remove?


On Feb. 4, 2016, 2:51 p.m., Chaoyu Tang wrote:
> > On a high level, deleting file one by one is slower. Could we instead remember the
original permission and set it to the new directory that we are going to replace?

Yeah, I initially implemented it in the way you suggested which is more performant. But it
could not get the right permission to my local directory probably due to the defect in HDFS
RawLocalFileSystem. It always return the HDFS default one (0666)to all local files/directory.
I have put comment in code to explain why I delete the file one by one.
          // RawLocalFileSystem seems not able to get the right permissions for a local file,
it
          // always returns hdfs default permission (00666). So we can not overwrite a directory
          // by deleting and recreating the directory and restoring its permissions. We should
          // delete all its files and subdirectories instead.


- Chaoyu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43176/#review117816
-----------------------------------------------------------


On Feb. 4, 2016, 4:07 a.m., Chaoyu Tang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/43176/
> -----------------------------------------------------------
> 
> (Updated Feb. 4, 2016, 4:07 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Szehon Ho, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-12965
>     https://issues.apache.org/jira/browse/HIVE-12965
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> In Hive, "insert overwrite local directory" first deletes the overwritten directory if
exists, recreate a new one, then copy the files from src directory to the new local directory.
This process sometimes changes the permissions of the to-be-overwritten local directory, therefore
causing some applications no more to be able to access its content.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java e9cd450 
> 
> Diff: https://reviews.apache.org/r/43176/diff/
> 
> 
> Testing
> -------
> 
> Manual tests
> Precommit tests
> 
> 
> Thanks,
> 
> Chaoyu Tang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message