impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Tauber-Marshall (Code Review)" <>
Subject [hs2client-CR] Initial structure for the C++ hiveserver2 client.
Date Thu, 21 Apr 2016 23:20:04 GMT
Thomas Tauber-Marshall has posted comments on this change.

Change subject: Initial structure for the C++ hiveserver2 client.

Patch Set 6:


Updated to the latest reviews and also finished implementing the rest of the operation functions.

Still need to write more tests.
File CMakeLists.txt:

Line 253:   src/hs2client/
> I don't think you need to put these here -- the test files are compiled as 
File src/hs2client/

Line 37: row_set->columns[i]
> this is copying columns[i] into column. Should this be a pointer?

Line 38: impl
> For what we see here, I don't actually think you need ColumnImpl or Column:
File src/hs2client/columnar-row-set.h:

Line 35:     return (*(nulls + (i / 8)) & (1 << (i % 8))) != 0;
> nulls[i / 8]
File src/hs2client/

Line 45: coon
> connect

Line 52: connection
> service?

Line 55: EXPECT_OK(service->Close());
> Why is Close() a no-op on this call but not on l58?
File src/hs2client/

Line 56:   assert(!IsConnected());
> Maybe DCHECK here? 

Line 66: try {
       :     impl_->transport->close();
       :   } catch (TException& tx) {}
> just call Close()?

Line 76: isOpen
> Are we sure this can't throw?
Yes. The documentation says it shouldn't, and I also looked at the code to verify, though
obviously it wouldn't hurt to put it in a 'try'
File src/hs2client/hs2service.h:

Line 76: Close
> What happens if you call this on a closed session -- no-op or error?

Line 78: Reconnect
> Do we really need this functionality, or could we ask users to just create 
If the connection to Impala is broken but Impala is still running, just creating a new service
would leave any already existing sessions orphaned, since they rely on the rpc object from
the service that created them in our current model, though I'm not sure Reconnect is the right
solution to that problem.

This brings up the important point of how we want to deal with transient errors, eg. right
now we're just returning errors immediately when rpcs fail, but maybe we want to add automatic
retries, and relatedly how we're going to test our behavior in the face of transient errors
(as well as what Thrift's behavior is in the case of transient errors, which I need to look
File src/hs2client/

Line 44:   EXPECT_OK(session_ok->ExecuteStatement("select * from "+ TEST_TBL, &select_op));
> nit: space

Line 55: "+
> nit space
File src/hs2client/

Line 30: _
> consistent naming with underscores please, I think we don't use them traili

Line 40:  
> maybe return immediately if !open_ ?
File src/hs2client/hs2session.h:

Line 40: Close
> can be called twice?

Line 48:   Status GetSchemas(const std::string& schema_name, std::shared_ptr<Operation>*
> In this new function, schema_name can be a wildcard, is that right? If so, 

Line 59:   Status GetFunctions(const std::string& schema_name, std::shared_ptr<Operation>*
> Same question with these others. Do they accept wildcards (e.g. "foo_*")
File src/hs2client/macros.h:

> we shouldn't let this macro leak into the public headers (this file is incl

Line 24: if (!status.ok()) {                   \
       :     return status;                      \
       :   }
> Yeah, in general you should wrap all multi-line macros in do {... } while(0
File src/hs2client/

Line 60:   EXPECT_TRUE(FindRow(1, "a", int_col.get(), string_col.get()));
       :   EXPECT_TRUE(FindRow(2, "b", int_col.get(), string_col.get()));
       :   EXPECT_FALSE(FindRow(3, int_col.get()));
       :   EXPECT_FALSE(FindRow(4, int_col.get()));
> This is fine, but the results are ordered so you should be able to check ex
File src/hs2client/

Line 31: using std::vector;
> I think it's better to use std members explicitly most of the time (unlike 

Line 36: 512
> is this from somewhere else? in our code we often pick 1024 so if this is j

Line 90: row_set_impl->row_set = resp.results
> This involves a memory copy of the data. A few ways around this but maybe t

Line 112: PrintResults
> Let's see if we can move PrintResults to test code or at least some utility

Line 113: namespace {
> ?
This hides these symbols outside of this .cc file to prevent linking conflicts, similar to
declaring them static.
File src/hs2client/operation.h:

Line 28: enum
> I think we can use "enum class" here.

Line 28: enum FetchOrientation {
       :   FETCH_NEXT,
       :   FETCH_PRIOR,
       :   FETCH_RELATIVE,
       :   FETCH_ABSOLUTE,
       :   FETCH_FIRST,
       :   FETCH_LAST
       : };
       : enum OperationState {
       :   RUNNING_STATE,
       :   FINISHED_STATE,
       :   CANCELED_STATE,
       :   CLOSED_STATE,
       :   ERROR_STATE,
       :   UKNOWN_STATE,
       :   PENDING_STATE,
       : };
> Are these just mirroring the definitions in hs2? Can you add a comment. Can

Line 59: GetState
> Brief comment. When is this valid to call?

Line 63: Wait
> Is this necessary? Seems like a useful function for testing, but I don't kn
Moved to Util.

Line 66: GetLog
> Brief comment, when is this valid to call?

Line 68: GetProfile
> same here and the other actions below

Line 84: PrintResults
> Agree -- I think we can leave this to users of the library to implement the
File src/hs2client/

Line 21: TCLIService
> I would even go so far as to add a unit test exhibiting that Thrift is not 

Line 50: CHECK_OK(status);
> I actually don't think we want to be using this macro everywhere (here and 

Line 56: CHECK_OK(status);
> Here you need to close the session.

Line 60: CHECK_OK
> here you would close the execute_op and then the session.

Line 66: int_col->length
> I think this is wrong. length seems to be the number of elements in the dat
I don't think so - the data vector will have length equal to the total number of elements,
null and non-null. See OperationTest::TestFetch for proof of this. The entries in the vector
for the null values will be 0 for numeric types, '' for string, etc.

I'll still add the assert and always use the same column, as suggested.

Line 68: auto
> would you mind using the real types here for clarity?

Line 86: Fetch
> please make sure that the Fetch() API documents how this works with end-of-

Line 87: CHECK_OK
> same
File src/hs2client/status.h:

Line 22: StatusCode
> can you add a comment this is referencing the HS2 status codes, and brief c

Line 23: Status
> we don't need to append Status on each one.
File src/hs2client/test-util.h:

Line 37: Get
> I don't see a method "Get()"
File src/hs2client/

Line 93: case
> can you define a stringstream only where you need it, i.e. here and on l108

Line 95: if (i != 0) {
       :           ss << ",";
       :         }
> 1 line
File src/hs2client/thrift-internal.h:

Line 68:   h
> nit extra space

Line 74:  
> I can add automated clang-tidy style cleaning in a follow up patch.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I7be82d5897ad6093116915b17924674c4ae508a0
Gerrit-PatchSet: 6
Gerrit-Project: hs2client
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <>
Gerrit-Reviewer: Marcel Kornacker <>
Gerrit-Reviewer: Matthew Jacobs <>
Gerrit-Reviewer: Thomas Tauber-Marshall <>
Gerrit-Reviewer: Wes McKinney <>
Gerrit-HasComments: Yes

View raw message