impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kathy Sun (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4024: Add "system" database and expose Impala metrics as a table
Date Tue, 06 Sep 2016 11:40:37 GMT
Kathy Sun has posted comments on this change.

Change subject: IMPALA-4024: Add "system" database and expose Impala metrics as a table

Patch Set 18:

File be/src/exec/

PS18, Line 49: tuple_idx_
> We don't need tuple_idx_ because it's always zero. This is generally true f
I thought leaving it there would be consistent with other scan-node....?

PS18, Line 94: idx
> I think we should move idx into the Metrics scanner. Maybe call it next_row
probably not.
idx is useful to decide whether we have next input batch.

Line 121:     while (!ReachedLimit() && !row_batch->AtCapacity() && InputBatchHasNext())
> If we get rid of InputBatchHasNext(), GetNextInputBatch(), I think we can c
It follows the same style with data-source-scan-node. 

(1024 rows a batch)
But now i think  getNext() in don't need two loops either. Two loops
make getNext() deal with all the batches. if so getNext will only called once.

I fixed my getNext() here.

PS18, Line 137: MetricAtLimit
> Rename this?
how about 

Line 149: Status SystemTableScanNode::GetNextInputBatch() {
> I don't think we need this function, GetNextInputBatch(), or many of the ot
Since we process 1024 rows a batch,  these are served for this feature.

next_row_idx_ is the index inside the input_batch. So whenever we get a new input batch, we
set the next_row_idx_ to 0.

items_size_ is the size of metrics_pool.  And this is related to another comment. you mentioned
we may need to remove size() in system-table-scanner, since  for some of system table, we
don't know it's size before we scan it.

I do need the item_size to know how many rows are left, and further how many rows are in next
File be/src/exec/system-table-scan-node.h:

PS18, Line 56: SystemTableScanner
> Let's make this a scoped_ptr<SystemTableScanner> so that we don't leak the 
File be/src/exec/

Line 46:     LOG(INFO) << "group name: " << group->name();
> We can remove this logging I think.

PS18, Line 49: ChildGroupMap::value_type&
> I think it's easier to use auto here.

Line 56:       metric_info.ip_address_ = ExecEnv::GetInstance()->webserver()->http_address();
> I think we should report the backend address rather than the webserver addr

Line 84:     size_t value_size = value[slot_desc->col_pos()].size();
> I think it would be just as simple and more efficient to avoid copying all 

But the code is longer now...
Is it ok if I create a enum like that. (not in thrift)

Line 85:     char* buffer = reinterpret_cast<char*>(tuple_pool->TryAllocate(value_size));
> Need to check if buffer is NULL and return a MemLimitExceeded() error (look
File be/src/exec/system-table-scanner.h:

> Should be something like:

Line 32: // For each table in "system" database, we should implement the scanner for scanning
> maybe instead: "we should implement a scanner that can materialize the tabl

Line 37:   virtual Status Open() = 0;
> Can you document Open() and MaterializeNextRow().

Line 38:   virtual Status MaterializeNextRow(
> Maybe GetNextTuple()?
I think I did batched GetNext()  (1024 rows per batch)
but rows will be materialized one at a time, following the same design of other scan node.

Line 40:   int size() { return size_; }
> I think we maybe don't want to make size() part of the interface because it
But I do need size() to serve 1024 batch feature. to decide the size of next batch, whether
it should be full 1024 batch, or the rows left.

maybe discuss offline

Line 46: struct MetricPackage {
> I mentioned in another comment - I think it's sufficient to save the Metric

Line 54:   vector<MetricPackage> metric_pool;
> metric_pool should be private and have '_' on the end.
I thought metric_pool is better... it gives us a impression that it contains all of the metrics.
( "metrics" are used else where and doesn't have this meaning)
File fe/src/main/java/com/cloudera/impala/catalog/

Line 496:   @Override
> I don't think we need hashCode() and equals() do we?
It is used when determine whether a list<db> contains a certain db (system, used in
authorizationTest() )

since db is a user defined object, we need to override hashCode() and equals() so that we
could use list.Contains() function. I thought it could be useful later. So I add them on.
But I could also compare the list by myself.
File fe/src/main/java/com/cloudera/impala/catalog/

Line 27:   private static final int SYSTEM_METRICS = -2;
> I think this would be better in TableId, since it is something that it shar

PS18, Line 31: this.
> No "this" here and below (we generally avoid using it when it's not necessa

PS18, Line 39: node_address
> maybe "impalad_address"? I think it's the impala daemon not the node.

PS18, Line 40: group_name
> I think just "group" is ok - less characters to type.

PS18, Line 44: this
> Don't need "this."
File fe/src/main/java/com/cloudera/impala/planner/

File fe/src/test/java/com/cloudera/impala/analysis/

Line 2182:     ImpaladCatalog catalog = new ImpaladTestCatalog(authzConfig);
> AuthzError() uses the ImpaladTestCatalog in ctx_ - you need to add the tabl

Line 2242:     //=========================================================================================
> Let's make the admin test a separate test as discussed.

Line 2294:     system.removeTable("test_table");
> I think it's ok to leave it.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I7adbeb45220c468e43b424d70c30b952f6cec2cd
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Kathy Sun <>
Gerrit-Reviewer: Kathy Sun <>
Gerrit-Reviewer: Tim Armstrong <>
Gerrit-HasComments: Yes

View raw message