impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-2523: Make HdfsTableSink aware of clustered input
Date Mon, 31 Oct 2016 16:33:48 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input

Patch Set 4:

File be/src/exec/

Line 277:       RETURN_IF_ERROR(FinalizePartitionFile(state, output_partition));
Can avoid a level of nesting with if (!new_file) break;

Line 294:     GetHashTblKey(dynamic_partition_key_expr_ctxs_, &key);
Oh man, I didn't notice this 'current_row_' thing earlier. It looks like for some reason the
original author of the code didn't want to pass an extra argument into functions so instead
did it implicitly via 'current_row_'

Can you refactor it so that it's passed as an actual argument?

Line 302:         RETURN_IF_ERROR(WriteRowsOfPartition(state, batch, current_partition_pair));
I think we should also remove the entry in partition_keys_to_output_partitions_ here so that
we use O(1) memory.
File tests/query_test/

Line 476: 
> Done. It takes 30 seconds on my machine. Too much for core?
Maybe a little much given core already takes too long - I think it's useful coverage but we're
also probably pretty unlikely to break it. Seems fine for exhaustive.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <>
Gerrit-Reviewer: Lars Volker <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message