flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabor Gevay (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-2142) GSoC project: Exact and Approximate Statistics for Data Streams and Windows
Date Wed, 03 Jun 2015 09:36:49 GMT
Gabor Gevay created FLINK-2142:
----------------------------------

             Summary: GSoC project: Exact and Approximate Statistics for Data Streams and
Windows
                 Key: FLINK-2142
                 URL: https://issues.apache.org/jira/browse/FLINK-2142
             Project: Flink
          Issue Type: New Feature
          Components: Streaming
            Reporter: Gabor Gevay
            Assignee: Gabor Gevay
            Priority: Minor


The goal of this project is to implement basic statistics of data streams and windows (like
average, median, variance, correlation, etc.) in a computationally efficient manner. This
involves designing custom preaggregators.

The exact calculation of some statistics (eg. frequencies, or the number of distinct elements)
would require memory proportional to the number of elements in the input (the window or the
entire stream). However, there are efficient algorithms and data structures using less memory
for calculating the same statistics only approximately, with user-specified error bounds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message