impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bharath Vissapragada (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] [PREVIEW] IMPALA-4886: Expose table metrics in the catalog web UI.
Date Wed, 06 Dec 2017 21:24:14 GMT
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8529
)

Change subject: [PREVIEW] IMPALA-4886: Expose table metrics in the catalog web UI.
......................................................................


Patch Set 2:

(28 comments)

Great change, exposes some invaluable information when debugging Catalog issues, which otherwise
is a black box.

 I still need to look deeper into some parts of the code around HdfsTable.toThrift() stats
accounting and TopNCache unit tests. Flushing out what I have so far.

http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog-server.h
File be/src/catalog/catalog-server.h:

http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog-server.h@201
PS2, Line 201: used
Clarify what is 'used'?

Maybe we should also mention that its top-n and not everything (with a reference to the fe
call)


http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog-server.h@217
PS2, Line 217: Update
nit: May be 'GetTopNTableUsageMetrics()' or something (I know it sounds horrible :D)? My thinking
is that we are not fully accounting for the entire usage and the method name could be misleading.


http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog-server.h@224
PS2, Line 224: object_name
nit: probably just 'name'? Its implicit (from the URL) that its a table.


http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog-server.cc@359
PS2, Line 359:   UpdateCatalogUsage(document);
Whats your take on moving top-x stuff to the /metrics page and populating them like other
metrics? I know some downstream clients (like CM) read that page and populate nice charts
on how those metrics change over time.


http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog-server.cc@390
PS2, Line 390:  has_metrics.SetBool(true);
Just curious, whats the use of has_metrics=true/has_large_tables=true? We could just read
the json and see if they exist?


http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog-server.cc@492
PS2, Line 492:   Webserver::ArgumentMap::const_iterator object_name_arg = args.find("object_name");
should we enable json view for this? (?json)


http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog.h
File be/src/catalog/catalog.h:

http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog.h@90
PS2, Line 90: in
nit: into?


http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog.h@96
PS2, Line 96:   Status GetCatalogUsage(TGetCatalogUsageResponse* response);
Same comment as in the other header, may be rename to convey that it is GetTopN....()?


http://gerrit.cloudera.org:8080/#/c/8529/2/common/thrift/Frontend.thrift
File common/thrift/Frontend.thrift:

http://gerrit.cloudera.org:8080/#/c/8529/2/common/thrift/Frontend.thrift@90
PS2, Line 90: // Arguments to getTableMetrics, which returns the metrics of a specific table.
            : struct TGetTableMetricsParams {
            :   1: required string db
            :   2: required string tbl
            : }
            : 
            : // Response to a getTableMetrics request. The response contains all the collected
metrics
            : // pretty-printed into a string.
            : struct TGetTableMetricsResponse {
            :   1: required string metrics
            : }
Can't we use TTableName and String directly? (or) probably use typedefs 

typedef TTableName TGetTableMetricsParams
typedef string TGetTableMetricsResponse


http://gerrit.cloudera.org:8080/#/c/8529/2/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

http://gerrit.cloudera.org:8080/#/c/8529/2/common/thrift/JniCatalog.thrift@604
PS2, Line 604: Currently, it keeps track of the
             : // estimated memory usage and the number of metadata operations that were performed
on
             : // that table since it was loaded.
Remove here and describe below may be, can likely become stale soon (Same for the next struct
too)


http://gerrit.cloudera.org:8080/#/c/8529/2/common/thrift/JniCatalog.thrift@607
PS2, Line 607: Stats
Looks like we are using Stats/Metrics interchangeably. Maybe use Metrics everywhere since
"Stats" can coincide with table stats.


http://gerrit.cloudera.org:8080/#/c/8529/2/common/thrift/JniCatalog.thrift@608
PS2, Line 608: required string table_name
TTableName?


http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/Catalog.java
File fe/src/main/java/org/apache/impala/catalog/Catalog.java:

http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/Catalog.java@154
PS2, Line 154: Table tbl = db.getTable(tableName);
             :     if (tbl != null) {
             :       tbl.incrementMetadataOpsCount();
             :       CatalogUsageMonitor.INSTANCE.updateFrequentlyAccessedTables(tbl);
             :     }
Looks subtle and flaky to me. This seems to work on the premise that we call Catalog.getTable()
only for a metadata-op? Future patches could likely break the counts if they don't know this?


May be we could move this to CatalogOpEx.getExistingTable() (Its private and so is only called
from inside for DDLs)


http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1007
PS2, Line 1007:     final Timer.Context context
Should the scope include tryLockTable()?


http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1372
PS2, Line 1372:   public TGetCatalogUsageResponse getCatalogUsage() {
getTopN...Metrics?


http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1391
PS2, Line 1391:   public String getTableMetrics(String dbName, String tblName) throws CatalogException
{
Doc?


http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1392
PS2, Line 1392:     Table tbl = getTable(dbName, tblName);
This is the problem I was talking about. This call increments the metadatOpsCount() as per
this patch.


http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1398
PS2, Line 1398: No metrics available for table " + dbName + "." + tblName
Also may be add "Table not yet loaded..."? Its unclear why metrics are not available yet.


http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/CatalogUsageMonitor.java
File fe/src/main/java/org/apache/impala/catalog/CatalogUsageMonitor.java:

http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/CatalogUsageMonitor.java@47
PS2, Line 47: true
It is unclear why evict should be true (or) whats the usecase for that.

Let's say we have 25 tables each accessed 100 times and a 26th table that is accessed for
the first time, this makes sure that the last one of 25 is evicted, now we have 24 tables
with 100 metadataops count and 1 table with 1 metadata op count. As a user I think I'd be
more interested in the former? No?


http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@221
PS2, Line 221: StorageStats
Maybe call FileMetadataStats to be ni sync with rest of the class?


http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1201
PS2, Line 1201:   public void load(boolean reuseMetadata, IMetaStoreClient client,
Make rest of the load()'s private so that all loads happen via this call and accounted?


http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/KuduTable.java
File fe/src/main/java/org/apache/impala/catalog/KuduTable.java:

http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/KuduTable.java@234
PS2, Line 234:                                         kuduTableName_, e);
formatting (below too)


http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/Table.java@135
PS2, Line 135: public void setEstimatedMetadataSize(long estimatedMetadataSize) {
             :     estimatedMetadataSize_.set(estimatedMetadataSize);
             :   }
             : 
             :   public void incrementMetadataOpsCount() { metadataOpsCount_.incrementAndGet();
}
Can these make a call to CatalogUsageMonitor.increment*(this) too? We can avoid making a separate
call at multiple places.


http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/common/Metrics.java
File fe/src/main/java/org/apache/impala/common/Metrics.java:

http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/common/Metrics.java@143
PS2, Line 143:  
%le ?


http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/common/Metrics.java@147
PS2, Line 147:   }
Same as my comment in the backend, can we expose json versions of these? toJson()


http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@259
PS2, Line 259: 
Can we add gauge metrics around this switch-case block to count DDLs by type? We could find
data points like, there have been too many REFRESHes in the past 1 hour etc.


http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/util/TopNCache.java
File fe/src/main/java/org/apache/impala/util/TopNCache.java:

http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/util/TopNCache.java@41
PS2, Line 41: R extends Long>
Do we need this? It is almost always Long, no?


http://gerrit.cloudera.org:8080/#/c/8529/2/www/catalog.tmpl
File www/catalog.tmpl:

http://gerrit.cloudera.org:8080/#/c/8529/2/www/catalog.tmpl@23
PS2, Line 23: {{?has_large_tables}}
Like I mentioned elsewhere, my feeling is that these belong to metrics and should be exposed
in /metrics?json for clients (like CM) to poll and chart.



-- 
To view, visit http://gerrit.cloudera.org:8080/8529
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37d407979e6d3b1a444b6b6265900b148facde9e
Gerrit-Change-Number: 8529
Gerrit-PatchSet: 2
Gerrit-Owner: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bharathv@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <philip@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <vercegovac@cloudera.com>
Gerrit-Comment-Date: Wed, 06 Dec 2017 21:24:14 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message