impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5310: Part 2: Add SAMPLED NDV() function.
Date Thu, 23 Nov 2017 01:08:33 GMT
Hello Bharath Vissapragada, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8569

to look at the new patch set (#3).

Change subject: IMPALA-5310: Part 2: Add SAMPLED_NDV() function.
......................................................................

IMPALA-5310: Part 2: Add SAMPLED_NDV() function.

Adds a new SAMPLED_NDV() aggregate function that is
intended to be used in COMPUTE STATS TABLESAMPLE.
This patch only adds the function itself. Integration
with COMPUTE STATS will come in a separate patch.

SAMPLED_NDV() estimates the number of distinct values (NDV)
based on a sample of data and the corresponding sampling rate.
The main idea is to collect several x/y data points where x is
the number of rows and y is the corresponding NDV estimate.
These data points are used to fit an objective function to the
data such that the true NDV can be extrapolated.
The aggregate function maintains a fixed number of HyperLogLog
intermediates to compute the x/y points.
Several objective functions are fit and the best-fit one is
used for extrapolation.

Adds the MPFIT C library to perform curve fitting:
https://www.physics.wisc.edu/~craigm/idl/cmpfit.html

The library is a C port from Fortran. Scipy uses the
Fortran version of the library for curve fitting.

Testing:
- added functional tests
- core/hdfs run passed

Change-Id: Ia51d56ee67ec6073e92f90bebb4005484138b820
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
A be/src/thirdparty/mpfit/DISCLAIMER
A be/src/thirdparty/mpfit/README
A be/src/thirdparty/mpfit/mpfit.c
A be/src/thirdparty/mpfit/mpfit.h
M be/src/util/CMakeLists.txt
A be/src/util/mpfit-util.cc
A be/src/util/mpfit-util.h
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M tests/query_test/test_aggregation.py
15 files changed, 3,897 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/8569/3
-- 
To view, visit http://gerrit.cloudera.org:8080/8569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia51d56ee67ec6073e92f90bebb4005484138b820
Gerrit-Change-Number: 8569
Gerrit-PatchSet: 3
Gerrit-Owner: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bharathv@cloudera.com>

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message