Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 794D1200BFB for ; Tue, 27 Dec 2016 12:16:00 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 76B2B160B31; Tue, 27 Dec 2016 11:16:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6A516160B23 for ; Tue, 27 Dec 2016 12:15:59 +0100 (CET) Received: (qmail 92016 invoked by uid 500); 27 Dec 2016 11:15:58 -0000 Mailing-List: contact issues-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list issues@flink.apache.org Received: (qmail 92002 invoked by uid 99); 27 Dec 2016 11:15:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Dec 2016 11:15:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 7F7A72C03DE for ; Tue, 27 Dec 2016 11:15:58 +0000 (UTC) Date: Tue, 27 Dec 2016 11:15:58 +0000 (UTC) From: "zhangjing (JIRA)" To: issues@flink.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (FLINK-5394) the estimateRowCount method of DataSetCalc didn't work MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 27 Dec 2016 11:16:00 -0000 [ https://issues.apache.org/jira/browse/FLINK-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangjing updated FLINK-5394: ----------------------------- Description: The estimateRowCount method of DataSetCalc didn't work now. If I run the following code, ` Table table = tableEnv .fromDataSet(data, "a, b, c") .groupBy("a") .select("a, a.avg, b.sum, c.count") .where("a == 1"); ` the cost of every node in Optimized node tree is : ` DataSetAggregate(groupBy=[a], select=[a, AVG(a) AS TMP_0, SUM(b) AS TMP_1, COUNT(c) AS TMP_2]): rowcount = 1000.0, cumulative cost = {3000.0 rows, 5000.0 cpu, 28000.0 io} DataSetCalc(select=[a, b, c], where=[=(a, 1)]): rowcount = 1000.0, cumulative cost = {2000.0 rows, 2000.0 cpu, 0.0 io} DataSetScan(table=[[_DataSetTable_0]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 1000.0 cpu, 0.0 io} ` We expect the input rowcount of DataSetAggregate less than 1000, however the actual input rowcount is still 1000 because the the estimateRowCount method of DataSetCalc didn't work. There are two reasons caused to this: 1. Didn't provide custom metadataProvider yet. So when DataSetAggregate calls RelMetadataQuery.getRowCount(DataSetCalc) to estimate its input rowcount which would dispatch to RelMdRowCount. 2. DataSetCalc is subclass of SingleRel. So previous function call would match getRowCount(SingleRel rel, RelMetadataQuery mq) which would never use DataSetCalc.estimateRowCount. The question would also appear to all Flink RelNodes which are subclass of SingleRel. I plan to resolve this problem by adding a FlinkRelMdRowCount which contains specific getRowCount of Flink RelNodes. was: The estimateRowCount method of DataSetCalc didn't work now. If I run the following code, ` Table table = tableEnv .fromDataSet(data, "a, b, c") .groupBy("a") .select("a, a.avg, b.sum, c.count") .where("a == 1"); ` the cost of every node in Optimized node tree is : ` DataSetAggregate(groupBy=[a], select=[a, AVG(a) AS TMP_0, SUM(b) AS TMP_1, COUNT(c) AS TMP_2]): rowcount = 1000.0, cumulative cost = {3000.0 rows, 5000.0 cpu, 28000.0 io} DataSetCalc(select=[a, b, c], where=[=(a, 1)]): rowcount = 1000.0, cumulative cost = {2000.0 rows, 2000.0 cpu, 0.0 io} DataSetScan(table=[[_DataSetTable_0]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 1000.0 cpu, 0.0 io} ` We expect the input rowcount of DataSetAggregate less than 1000, however the actual input rowcount is still 1000 because the the estimateRowCount method of DataSetCalc didn't work. There are two reasons caused to this: 1. Didn't provide custom metadataProvider yet. So when DataSetAggregate calls RelMetadataQuery.getRowCount(DataSetCalc) to estimate its input rowcount which would dispatch to RelMdRowCount. 2. DataSetCalc is subclass of SingleRel. So previous function call would match getRowCount(SingleRel rel, RelMetadataQuery mq) which would never use DataSetCalc.estimateRowCount. I plan to resolve this problem by adding a FlinkRelMdRowCount which contains specific getRowCount of Flink RelNodes. > the estimateRowCount method of DataSetCalc didn't work > ------------------------------------------------------ > > Key: FLINK-5394 > URL: https://issues.apache.org/jira/browse/FLINK-5394 > Project: Flink > Issue Type: Bug > Components: Table API & SQL > Reporter: zhangjing > Assignee: zhangjing > > The estimateRowCount method of DataSetCalc didn't work now. > If I run the following code, > ` > Table table = tableEnv > .fromDataSet(data, "a, b, c") > .groupBy("a") > .select("a, a.avg, b.sum, c.count") > .where("a == 1"); > ` > the cost of every node in Optimized node tree is : > ` > DataSetAggregate(groupBy=[a], select=[a, AVG(a) AS TMP_0, SUM(b) AS TMP_1, COUNT(c) AS TMP_2]): rowcount = 1000.0, cumulative cost = {3000.0 rows, 5000.0 cpu, 28000.0 io} > DataSetCalc(select=[a, b, c], where=[=(a, 1)]): rowcount = 1000.0, cumulative cost = {2000.0 rows, 2000.0 cpu, 0.0 io} > DataSetScan(table=[[_DataSetTable_0]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 1000.0 cpu, 0.0 io} > ` > We expect the input rowcount of DataSetAggregate less than 1000, however the actual input rowcount is still 1000 because the the estimateRowCount method of DataSetCalc didn't work. > There are two reasons caused to this: > 1. Didn't provide custom metadataProvider yet. So when DataSetAggregate calls RelMetadataQuery.getRowCount(DataSetCalc) to estimate its input rowcount which would dispatch to RelMdRowCount. > 2. DataSetCalc is subclass of SingleRel. So previous function call would match getRowCount(SingleRel rel, RelMetadataQuery mq) which would never use DataSetCalc.estimateRowCount. > The question would also appear to all Flink RelNodes which are subclass of SingleRel. > I plan to resolve this problem by adding a FlinkRelMdRowCount which contains specific getRowCount of Flink RelNodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)