impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Henry Robinson (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-3452: S3: Disable Impala staging for INSERTs via flag for speedup
Date Mon, 02 May 2016 18:43:16 GMT
Henry Robinson has posted comments on this change.

Change subject: IMPALA-3452: S3: Disable Impala staging for INSERTs via flag for speedup
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/2905/1/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

Line 52: DEFINE_bool(s3_skip_insert_staging, false, "Enable to skip the staging step for INSERTs
"
> I  recommend changing to query option.
The idea behind staging and having the coordinator do the final move is to allow individual
workers to complete their writes before publishing the results. 

That means if there are any errors during the query (e.g. scan of a malformed file) they are
caught before the writes are published. Just having local staging doesn't fix this: it's having
the coordinator act like a distributed barrier that does.

Having this two-stage process also smooths out the effects of skew on the 'partial write window'.

For S3 this doesn't matter as much because the write latency is so high, the partial-write
window is very large, so I'm in favour of this option. I think a query option is best as well,
because the behaviour you want is workload-dependent. It would be ok for the option to default
to 'true'.


Line 289: so via a flag "blah" ****
fix this


-- 
To view, visit http://gerrit.cloudera.org:8080/2905
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iff9620d41ba0d5fb1aa0c9f4abb48866fc2b0698
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-Reviewer: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message