impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4024: Add "system" database and expose Impala metrics as a table
Date Fri, 02 Sep 2016 23:40:22 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4024: Add "system" database and expose Impala metrics as a table
......................................................................


Patch Set 18:

(29 comments)

http://gerrit.cloudera.org:8080/#/c/3863/18/be/src/exec/system-table-scan-node.cc
File be/src/exec/system-table-scan-node.cc:

PS18, Line 49: tuple_idx_
We don't need tuple_idx_ because it's always zero. This is generally true for scanners - they
only produce a single tuple per row.


PS18, Line 94: idx
I think we should move idx into the Metrics scanner. Maybe call it next_row_idx_. That way
each scanner can keep track of its position in the most convenient way.


PS18, Line 94: MaterializeNextRow
This should be MaterializeNextTuple() now that I think about it, because it's only filling
in a Tuple.


Line 121:     while (!ReachedLimit() && !row_batch->AtCapacity() && InputBatchHasNext())
{
If we get rid of InputBatchHasNext(), GetNextInputBatch(), I think we can combine the two
loops.


PS18, Line 137: MetricAtLimit
Rename this?


Line 149: Status SystemTableScanNode::GetNextInputBatch() {
I don't think we need this function, GetNextInputBatch(), or many of the other variables:
next_row_idx_, items_size_, etc, do we?


http://gerrit.cloudera.org:8080/#/c/3863/18/be/src/exec/system-table-scan-node.h
File be/src/exec/system-table-scan-node.h:

PS18, Line 56: SystemTableScanner
Let's make this a scoped_ptr<SystemTableScanner> so that we don't leak the memory.


http://gerrit.cloudera.org:8080/#/c/3863/18/be/src/exec/system-table-scanner.cc
File be/src/exec/system-table-scanner.cc:

Line 46:     LOG(INFO) << "group name: " << group->name();
We can remove this logging I think.


PS18, Line 49: ChildGroupMap::value_type&
I think it's easier to use auto here.


Line 56:       metric_info.ip_address_ = ExecEnv::GetInstance()->webserver()->http_address();
I think we should report the backend address rather than the webserver address.


Line 84:     size_t value_size = value[slot_desc->col_pos()].size();
I think it would be just as simple and more efficient to avoid copying all the strings by
using a switch here.

e.g.
  const string& slot_value;
  switch (slot_desc->col_pos()) {
   case IMPALAD_ADDRESS:
    RETURN_IF_ERROR(WriteStringSlot(impalad_address, tuple_pool, slot));    
   ...
   case METRIC_GROUP:
    RETURN_IF_ERROR(WriteStringSlot(metrics_[idx].group->Name(), tuple_pool, slot));
   ...
  }


Line 85:     char* buffer = reinterpret_cast<char*>(tuple_pool->TryAllocate(value_size));
Need to check if buffer is NULL and return a MemLimitExceeded() error (look at other places
we call TryAllocate() in scanners).


http://gerrit.cloudera.org:8080/#/c/3863/18/be/src/exec/system-table-scanner.h
File be/src/exec/system-table-scanner.h:

PS18, Line 18: SYSTEMTABLESCANNER_H_
Should be something like:

IMPALA_SYSTEM_TABLE_SCANNER_H

for consistency


Line 32: // For each table in "system" database, we should implement the scanner for scanning
it
maybe instead: "we should implement a scanner that can materialize the table's rows.".


Line 37:   virtual Status Open() = 0;
Can you document Open() and MaterializeNextRow().


Line 38:   virtual Status MaterializeNextRow(
Maybe GetNextTuple()?

I think there's a design decision here about whether to have a batched GetNext() interface
or a row-at-a-time interface. The batched interface would be faster but a little more complex.

I think it would be good to extend the class comment to say that the scanner materializes
a tuple-at-a-time for simplicity because system tables are generally small and performance
isn't critical.


Line 40:   int size() { return size_; }
I think we maybe don't want to make size() part of the interface because it's possible that
we'll have scanners that don't know the # of rows before scanning.

I think add a bool member variable/method eos_ that is true when the scanner returned its
last row. We use that pattern elsewhere.


Line 46: struct MetricPackage {
I mentioned in another comment - I think it's sufficient to save the Metric* and MetricGroup*
here and then get the required strings when needed.


Line 54:   vector<MetricPackage> metric_pool;
metric_pool should be private and have '_' on the end.

Maybe just call it 'metrics_'?


http://gerrit.cloudera.org:8080/#/c/3863/18/fe/src/main/java/com/cloudera/impala/catalog/Db.java
File fe/src/main/java/com/cloudera/impala/catalog/Db.java:

Line 496:   @Override
I don't think we need hashCode() and equals() do we?


http://gerrit.cloudera.org:8080/#/c/3863/18/fe/src/main/java/com/cloudera/impala/catalog/SystemDb.java
File fe/src/main/java/com/cloudera/impala/catalog/SystemDb.java:

Line 27:   private static final int SYSTEM_METRICS = -2;
I think this would be better in TableId, since it is something that it shared across all tables,
not just system tables.


PS18, Line 31: this.
No "this" here and below (we generally avoid using it when it's not necessary).


PS18, Line 39: node_address
maybe "impalad_address"? I think it's the impala daemon not the node.


PS18, Line 40: group_name
I think just "group" is ok - less characters to type.


PS18, Line 44: this
Don't need "this."


http://gerrit.cloudera.org:8080/#/c/3863/18/fe/src/main/java/com/cloudera/impala/planner/SystemTableScanNode.java
File fe/src/main/java/com/cloudera/impala/planner/SystemTableScanNode.java:

PS18, Line 46: ESTIMATE_NUM_ROWS_PER_NODE
ESTIMATED instead of ESTIMATE


http://gerrit.cloudera.org:8080/#/c/3863/18/fe/src/test/java/com/cloudera/impala/analysis/AuthorizationTest.java
File fe/src/test/java/com/cloudera/impala/analysis/AuthorizationTest.java:

Line 2182:     ImpaladCatalog catalog = new ImpaladTestCatalog(authzConfig);
AuthzError() uses the ImpaladTestCatalog in ctx_ - you need to add the table to that one.

I think something like this would work:

   ctx_.catalog.getDb("system").addTable(....)

Also add a comment to explain that you're adding a fake table with no SELECT permissions.


Line 2242:     //=========================================================================================
Let's make the admin test a separate test as discussed.


Line 2294:     system.removeTable("test_table");
I think it's ok to leave it.


-- 
To view, visit http://gerrit.cloudera.org:8080/3863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7adbeb45220c468e43b424d70c30b952f6cec2cd
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Kathy Sun <kathy.sun@cloudera.com>
Gerrit-Reviewer: Kathy Sun <kathy.sun@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message