impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA 2523: Make HdfsTableSink aware of clustered input
Date Thu, 27 Oct 2016 15:51:16 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA_2523: Make HdfsTableSink aware of clustered input
......................................................................


Patch Set 1:

(2 comments)

I didn't do a full pass but had thoughts on testing.

http://gerrit.cloudera.org:8080/#/c/4863/1//COMMIT_MSG
Commit Message:

PS1, Line 7: A_
- instead of _


Line 12: 
RE: testing - off the top of my head, I think we need:

* Very large inserts with partitions spanning many row batches
* Inserts with a small number of rows per partition (e.g. 1, 2, 3).
* Tests that check that the expected # of files are created

It would also be good to add batch_size as a test dimension (if it isn't already for the insert
tests) and test with some different batch sizes to hit more edge cases, e.g. 1, 16, default
(1024).


-- 
To view, visit http://gerrit.cloudera.org:8080/4863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message