impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kathy Sun (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4024: Add "system" database and expose Impala metrics as a table
Date Tue, 06 Sep 2016 11:40:37 GMT
Kathy Sun has posted comments on this change.

Change subject: IMPALA-4024: Add "system" database and expose Impala metrics as a table
......................................................................


Patch Set 18:

(28 comments)

http://gerrit.cloudera.org:8080/#/c/3863/18/be/src/exec/system-table-scan-node.cc
File be/src/exec/system-table-scan-node.cc:

PS18, Line 49: tuple_idx_
> We don't need tuple_idx_ because it's always zero. This is generally true f
Done.
I thought leaving it there would be consistent with other scan-node....?


PS18, Line 94: idx
> I think we should move idx into the Metrics scanner. Maybe call it next_row
probably not.
idx is useful to decide whether we have next input batch.


Line 121:     while (!ReachedLimit() && !row_batch->AtCapacity() && InputBatchHasNext())
{
> If we get rid of InputBatchHasNext(), GetNextInputBatch(), I think we can c
It follows the same style with data-source-scan-node. 

(1024 rows a batch)
But now i think  getNext() in data-source-scan-node.cc don't need two loops either. Two loops
make getNext() deal with all the batches. if so getNext will only called once.

I fixed my getNext() here.


PS18, Line 137: MetricAtLimit
> Rename this?
how about 
AllItemsScanned()


Line 149: Status SystemTableScanNode::GetNextInputBatch() {
> I don't think we need this function, GetNextInputBatch(), or many of the ot
Since we process 1024 rows a batch,  these are served for this feature.

next_row_idx_ is the index inside the input_batch. So whenever we get a new input batch, we
set the next_row_idx_ to 0.

items_size_ is the size of metrics_pool.  And this is related to another comment. you mentioned
we may need to remove size() in system-table-scanner, since  for some of system table, we
don't know it's size before we scan it.

I do need the item_size to know how many rows are left, and further how many rows are in next
batch.


http://gerrit.cloudera.org:8080/#/c/3863/18/be/src/exec/system-table-scan-node.h
File be/src/exec/system-table-scan-node.h:

PS18, Line 56: SystemTableScanner
> Let's make this a scoped_ptr<SystemTableScanner> so that we don't leak the 
Done


http://gerrit.cloudera.org:8080/#/c/3863/18/be/src/exec/system-table-scanner.cc
File be/src/exec/system-table-scanner.cc:

Line 46:     LOG(INFO) << "group name: " << group->name();
> We can remove this logging I think.
Done


PS18, Line 49: ChildGroupMap::value_type&
> I think it's easier to use auto here.
Done


Line 56:       metric_info.ip_address_ = ExecEnv::GetInstance()->webserver()->http_address();
> I think we should report the backend address rather than the webserver addr
Done


Line 84:     size_t value_size = value[slot_desc->col_pos()].size();
> I think it would be just as simple and more efficient to avoid copying all 
Done.

But the code is longer now...
Is it ok if I create a enum like that. (not in thrift)


Line 85:     char* buffer = reinterpret_cast<char*>(tuple_pool->TryAllocate(value_size));
> Need to check if buffer is NULL and return a MemLimitExceeded() error (look
Done


http://gerrit.cloudera.org:8080/#/c/3863/18/be/src/exec/system-table-scanner.h
File be/src/exec/system-table-scanner.h:

PS18, Line 18: SYSTEMTABLESCANNER_H_
> Should be something like:
Done


Line 32: // For each table in "system" database, we should implement the scanner for scanning
it
> maybe instead: "we should implement a scanner that can materialize the tabl
Done


Line 37:   virtual Status Open() = 0;
> Can you document Open() and MaterializeNextRow().
Done


Line 38:   virtual Status MaterializeNextRow(
> Maybe GetNextTuple()?
I think I did batched GetNext()  (1024 rows per batch)
but rows will be materialized one at a time, following the same design of other scan node.


Line 40:   int size() { return size_; }
> I think we maybe don't want to make size() part of the interface because it
But I do need size() to serve 1024 batch feature. to decide the size of next batch, whether
it should be full 1024 batch, or the rows left.

maybe discuss offline


Line 46: struct MetricPackage {
> I mentioned in another comment - I think it's sufficient to save the Metric
Done


Line 54:   vector<MetricPackage> metric_pool;
> metric_pool should be private and have '_' on the end.
Done.
I thought metric_pool is better... it gives us a impression that it contains all of the metrics.
( "metrics" are used else where and doesn't have this meaning)


http://gerrit.cloudera.org:8080/#/c/3863/18/fe/src/main/java/com/cloudera/impala/catalog/Db.java
File fe/src/main/java/com/cloudera/impala/catalog/Db.java:

Line 496:   @Override
> I don't think we need hashCode() and equals() do we?
It is used when determine whether a list<db> contains a certain db (system, used in
authorizationTest() )

since db is a user defined object, we need to override hashCode() and equals() so that we
could use list.Contains() function. I thought it could be useful later. So I add them on.
But I could also compare the list by myself.


http://gerrit.cloudera.org:8080/#/c/3863/18/fe/src/main/java/com/cloudera/impala/catalog/SystemDb.java
File fe/src/main/java/com/cloudera/impala/catalog/SystemDb.java:

Line 27:   private static final int SYSTEM_METRICS = -2;
> I think this would be better in TableId, since it is something that it shar
Done


PS18, Line 31: this.
> No "this" here and below (we generally avoid using it when it's not necessa
Done


PS18, Line 39: node_address
> maybe "impalad_address"? I think it's the impala daemon not the node.
Done


PS18, Line 40: group_name
> I think just "group" is ok - less characters to type.
Done


PS18, Line 44: this
> Don't need "this."
Done


http://gerrit.cloudera.org:8080/#/c/3863/18/fe/src/main/java/com/cloudera/impala/planner/SystemTableScanNode.java
File fe/src/main/java/com/cloudera/impala/planner/SystemTableScanNode.java:

PS18, Line 46: ESTIMATE_NUM_ROWS_PER_NODE
> ESTIMATED instead of ESTIMATE
Done


http://gerrit.cloudera.org:8080/#/c/3863/18/fe/src/test/java/com/cloudera/impala/analysis/AuthorizationTest.java
File fe/src/test/java/com/cloudera/impala/analysis/AuthorizationTest.java:

Line 2182:     ImpaladCatalog catalog = new ImpaladTestCatalog(authzConfig);
> AuthzError() uses the ImpaladTestCatalog in ctx_ - you need to add the tabl
Done


Line 2242:     //=========================================================================================
> Let's make the admin test a separate test as discussed.
Done


Line 2294:     system.removeTable("test_table");
> I think it's ok to leave it.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7adbeb45220c468e43b424d70c30b952f6cec2cd
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Kathy Sun <kathy.sun@cloudera.com>
Gerrit-Reviewer: Kathy Sun <kathy.sun@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message