Return-Path: X-Original-To: apmail-flink-dev-archive@www.apache.org Delivered-To: apmail-flink-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AD58B175DC for ; Wed, 3 Jun 2015 09:36:50 +0000 (UTC) Received: (qmail 83931 invoked by uid 500); 3 Jun 2015 09:36:50 -0000 Delivered-To: apmail-flink-dev-archive@flink.apache.org Received: (qmail 83876 invoked by uid 500); 3 Jun 2015 09:36:50 -0000 Mailing-List: contact dev-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list dev@flink.apache.org Received: (qmail 83828 invoked by uid 99); 3 Jun 2015 09:36:49 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Jun 2015 09:36:49 +0000 Date: Wed, 3 Jun 2015 09:36:49 +0000 (UTC) From: "Gabor Gevay (JIRA)" To: dev@flink.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (FLINK-2142) GSoC project: Exact and Approximate Statistics for Data Streams and Windows MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Gabor Gevay created FLINK-2142: ---------------------------------- Summary: GSoC project: Exact and Approximate Statistics for Data Streams and Windows Key: FLINK-2142 URL: https://issues.apache.org/jira/browse/FLINK-2142 Project: Flink Issue Type: New Feature Components: Streaming Reporter: Gabor Gevay Assignee: Gabor Gevay Priority: Minor The goal of this project is to implement basic statistics of data streams and windows (like average, median, variance, correlation, etc.) in a computationally efficient manner. This involves designing custom preaggregators. The exact calculation of some statistics (eg. frequencies, or the number of distinct elements) would require memory proportional to the number of elements in the input (the window or the entire stream). However, there are efficient algorithms and data structures using less memory for calculating the same statistics only approximately, with user-specified error bounds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)