impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Volker (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-2523: Make HdfsTableSink aware of clustered input
Date Sat, 29 Oct 2016 21:35:13 GMT
Lars Volker has posted comments on this change.

Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input
......................................................................


Patch Set 4:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/4863/4/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

Line 268:   // Pass the row batch to the writer. If new_file is returned true then the current
> I find this comment a little confusing. Maybe explain more the "why", i.e. 
Done. I had moved this from elsewhere, but tried to make it more clear.


Line 272:   do {
> I feel like this might be clearer if you write it as while(true) and then b
Done


Line 288:   // TODO: Should we adapt hdfs writer to accept start,num pair instead of a vector?
> My guess is that it would only help a little bit. It's nice to use the same
Ok


Line 296:     PartitionPair* next_partition_pair = NULL;
> It would be more efficient to keep track of the previous key and comparing 
Done


PS4, Line 301: Flush
> I think it might be clearer to just say "Write rows" instead of "flush". Fl
Done


Line 301:         // Flush previous partition
> I think the comments in here are a little too low level. Here maybe a singl
Done


Line 310:     // Collect row index
> This comment isn't too helpful.
Done


PS4, Line 313: Flush
> "Write rows to" is probably clearer than "flush".
Done


PS4, Line 607: WriteRowsOfPartition
> WriteRowsToPartition?
Done


http://gerrit.cloudera.org:8080/#/c/4863/4/be/src/exec/hdfs-table-sink.h
File be/src/exec/hdfs-table-sink.h:

PS4, Line 183: rows
> Maybe reword to be clearer, took me a while to figure out:
Done


PS4, Line 209: partitions
> partition's (missing apostrophe).
Done


Line 209:   /// Writes all rows of a partition to the partitions writer and clears the row
index
> This could be clearer that it's only writing the rows with indices in 'part
Done


PS4, Line 215:  is expected to
> This could be more declarative - i.e. "must" instead of "is expected to"
Done


http://gerrit.cloudera.org:8080/#/c/4863/4/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

PS4, Line 71: text
> parquet, right?
Done


http://gerrit.cloudera.org:8080/#/c/4863/4/tests/query_test/test_insert_behaviour.py
File tests/query_test/test_insert_behaviour.py:

Line 476: 
> Hmm, I think we could also do with a test that writes some large partitions
Done. It takes 30 seconds on my machine. Too much for core?


Line 477: 
> Extra blank line
Done


Line 501:       assert len(files) == 1
> Comment why we only expect one file - it might not be that obvious to peopl
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message