impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Volker (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-2523: Make HdfsTableSink aware of clustered input
Date Sat, 29 Oct 2016 21:35:13 GMT
Lars Volker has posted comments on this change.

Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input

Patch Set 4:

File be/src/exec/

Line 268:   // Pass the row batch to the writer. If new_file is returned true then the current
> I find this comment a little confusing. Maybe explain more the "why", i.e. 
Done. I had moved this from elsewhere, but tried to make it more clear.

Line 272:   do {
> I feel like this might be clearer if you write it as while(true) and then b

Line 288:   // TODO: Should we adapt hdfs writer to accept start,num pair instead of a vector?
> My guess is that it would only help a little bit. It's nice to use the same

Line 296:     PartitionPair* next_partition_pair = NULL;
> It would be more efficient to keep track of the previous key and comparing 

PS4, Line 301: Flush
> I think it might be clearer to just say "Write rows" instead of "flush". Fl

Line 301:         // Flush previous partition
> I think the comments in here are a little too low level. Here maybe a singl

Line 310:     // Collect row index
> This comment isn't too helpful.

PS4, Line 313: Flush
> "Write rows to" is probably clearer than "flush".

PS4, Line 607: WriteRowsOfPartition
> WriteRowsToPartition?
File be/src/exec/hdfs-table-sink.h:

PS4, Line 183: rows
> Maybe reword to be clearer, took me a while to figure out:

PS4, Line 209: partitions
> partition's (missing apostrophe).

Line 209:   /// Writes all rows of a partition to the partitions writer and clears the row
> This could be clearer that it's only writing the rows with indices in 'part

PS4, Line 215:  is expected to
> This could be more declarative - i.e. "must" instead of "is expected to"
File tests/query_test/

PS4, Line 71: text
> parquet, right?
File tests/query_test/

Line 476: 
> Hmm, I think we could also do with a test that writes some large partitions
Done. It takes 30 seconds on my machine. Too much for core?

Line 477: 
> Extra blank line

Line 501:       assert len(files) == 1
> Comment why we only expect one file - it might not be that obvious to peopl

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <>
Gerrit-Reviewer: Lars Volker <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message