impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sailesh Mukil (Code Review)" <>
Subject [Impala-CR](cdh5-trunk) IMPALA-1878: Support INSERT and LOAD DATA on S3 and between filesystems
Date Wed, 06 Apr 2016 00:18:53 GMT
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-1878: Support INSERT and LOAD DATA on S3 and between filesystems

Patch Set 10:

File be/src/runtime/

Line 839: There is no directory structure in S3, so "inheriting" permissions is not
        :           // possible.
> S3 has per-object permissions (in the form of ACLs see
That is possible, but since there is no directory structure, we don't have anywhere to inherit
permissions from.

An alternative way of doing it would be to save the partition's default permissions in a hidden
file or something similar (just basically someplace to hold the state), and every time we
insert, we "inherit" permissions from this file.
Or maybe just stat an existing object and copy over those permissions (this could have other
complications though).

I'll leave this as a TODO for now and address it in a following patch.

Line 875: stringstream ss;
        :           ss << "Error(s) deleting partition directories. First error (of
        :              << partition_create_ops.errors().size() << ") was: " <<
> While you are here, could you clean up he use of stringstream in favour of 
File be/src/util/

Line 111: .c_str()
> not really your change, but I think you can get rid of .c_str() in all the 

Line 128:     string error_msg = connection_status.ok() ? GetStrErrMsg() :
> put the line break here, and put the whole ternary statement on the next li

Line 156:   if (connection_cache != NULL) connection_cache_ = connection_cache;
> You should do this even if connection_cache is NULL - the caller might be t
File be/src/util/hdfs-bulk-ops.h:

Line 94:   /// Constructs a new operation set.
> comment is now redundant, can be removed.

Line 109:   /// If abort_on_error is true, execution will finish after the first error seen.
> update comment to explain connection_cache

Line 121:     return connection_cache_;
> one line

Line 130:   Promise<bool> promise_;
> Can you replace this with a CountingBarrier? I think that will be more natu
I think the reason a promise was used here instead of a counting barrier was because a CountingBarrier
cannot return an error. And we need the promise to notify us of an error if it does happen.

I can modify CountingBarrier to maintain its current behavior and optionally return an error
if necessary. Should I do that instead?
File be/src/util/

Line 77: // TODO: 's3a' is the smallest scheme that our filesystem supports, so we compare
       :   // first 6 characters. Change if we support a filesystem with a smaller scheme.
> Comment is out-of-date

Line 85:   // HDFS path.
> this isn't a valid assumption after Anuj's change.
Once Anuj pushes his change, I'll rebase and submit the final version. (Or he can rebase if
I get this patch in first)
File be/src/util/hdfs-util.h:

Line 45: IsDfsPath
> I think this should be IsHdfsPath(...), because other filesystems technical

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I94e15ad67752dce21c9b7c1dced6e114905a942d
Gerrit-PatchSet: 10
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Sailesh Mukil <>
Gerrit-Reviewer: Dan Hecht <>
Gerrit-Reviewer: Henry Robinson <>
Gerrit-Reviewer: Michael Brown <>
Gerrit-Reviewer: Mostafa Mokhtar <>
Gerrit-Reviewer: Sailesh Mukil <>
Gerrit-Reviewer: Taras Bobrovytsky <>
Gerrit-HasComments: Yes

View raw message