impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4024: Add "system" database and expose Impala metrics as a table
Date Fri, 02 Sep 2016 23:40:22 GMT
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4024: Add "system" database and expose Impala metrics as a table

Patch Set 18:

File be/src/exec/

PS18, Line 49: tuple_idx_
We don't need tuple_idx_ because it's always zero. This is generally true for scanners - they
only produce a single tuple per row.

PS18, Line 94: idx
I think we should move idx into the Metrics scanner. Maybe call it next_row_idx_. That way
each scanner can keep track of its position in the most convenient way.

PS18, Line 94: MaterializeNextRow
This should be MaterializeNextTuple() now that I think about it, because it's only filling
in a Tuple.

Line 121:     while (!ReachedLimit() && !row_batch->AtCapacity() && InputBatchHasNext())
If we get rid of InputBatchHasNext(), GetNextInputBatch(), I think we can combine the two

PS18, Line 137: MetricAtLimit
Rename this?

Line 149: Status SystemTableScanNode::GetNextInputBatch() {
I don't think we need this function, GetNextInputBatch(), or many of the other variables:
next_row_idx_, items_size_, etc, do we?
File be/src/exec/system-table-scan-node.h:

PS18, Line 56: SystemTableScanner
Let's make this a scoped_ptr<SystemTableScanner> so that we don't leak the memory.
File be/src/exec/

Line 46:     LOG(INFO) << "group name: " << group->name();
We can remove this logging I think.

PS18, Line 49: ChildGroupMap::value_type&
I think it's easier to use auto here.

Line 56:       metric_info.ip_address_ = ExecEnv::GetInstance()->webserver()->http_address();
I think we should report the backend address rather than the webserver address.

Line 84:     size_t value_size = value[slot_desc->col_pos()].size();
I think it would be just as simple and more efficient to avoid copying all the strings by
using a switch here.

  const string& slot_value;
  switch (slot_desc->col_pos()) {
    RETURN_IF_ERROR(WriteStringSlot(impalad_address, tuple_pool, slot));    
    RETURN_IF_ERROR(WriteStringSlot(metrics_[idx].group->Name(), tuple_pool, slot));

Line 85:     char* buffer = reinterpret_cast<char*>(tuple_pool->TryAllocate(value_size));
Need to check if buffer is NULL and return a MemLimitExceeded() error (look at other places
we call TryAllocate() in scanners).
File be/src/exec/system-table-scanner.h:

Should be something like:


for consistency

Line 32: // For each table in "system" database, we should implement the scanner for scanning
maybe instead: "we should implement a scanner that can materialize the table's rows.".

Line 37:   virtual Status Open() = 0;
Can you document Open() and MaterializeNextRow().

Line 38:   virtual Status MaterializeNextRow(
Maybe GetNextTuple()?

I think there's a design decision here about whether to have a batched GetNext() interface
or a row-at-a-time interface. The batched interface would be faster but a little more complex.

I think it would be good to extend the class comment to say that the scanner materializes
a tuple-at-a-time for simplicity because system tables are generally small and performance
isn't critical.

Line 40:   int size() { return size_; }
I think we maybe don't want to make size() part of the interface because it's possible that
we'll have scanners that don't know the # of rows before scanning.

I think add a bool member variable/method eos_ that is true when the scanner returned its
last row. We use that pattern elsewhere.

Line 46: struct MetricPackage {
I mentioned in another comment - I think it's sufficient to save the Metric* and MetricGroup*
here and then get the required strings when needed.

Line 54:   vector<MetricPackage> metric_pool;
metric_pool should be private and have '_' on the end.

Maybe just call it 'metrics_'?
File fe/src/main/java/com/cloudera/impala/catalog/

Line 496:   @Override
I don't think we need hashCode() and equals() do we?
File fe/src/main/java/com/cloudera/impala/catalog/

Line 27:   private static final int SYSTEM_METRICS = -2;
I think this would be better in TableId, since it is something that it shared across all tables,
not just system tables.

PS18, Line 31: this.
No "this" here and below (we generally avoid using it when it's not necessary).

PS18, Line 39: node_address
maybe "impalad_address"? I think it's the impala daemon not the node.

PS18, Line 40: group_name
I think just "group" is ok - less characters to type.

PS18, Line 44: this
Don't need "this."
File fe/src/main/java/com/cloudera/impala/planner/

File fe/src/test/java/com/cloudera/impala/analysis/

Line 2182:     ImpaladCatalog catalog = new ImpaladTestCatalog(authzConfig);
AuthzError() uses the ImpaladTestCatalog in ctx_ - you need to add the table to that one.

I think something like this would work:


Also add a comment to explain that you're adding a fake table with no SELECT permissions.

Line 2242:     //=========================================================================================
Let's make the admin test a separate test as discussed.

Line 2294:     system.removeTable("test_table");
I think it's ok to leave it.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I7adbeb45220c468e43b424d70c30b952f6cec2cd
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Kathy Sun <>
Gerrit-Reviewer: Kathy Sun <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message