flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1151) CollectionDataSource does not provide statistics
Date Mon, 20 Oct 2014 08:06:34 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14176698#comment-14176698

ASF GitHub Bot commented on FLINK-1151:

Github user fhueske commented on the pull request:

    I also thought about adding ``getMinimumLength()`` to the type info but decided for the
seriailzers because they define how much data is actually written out (e.g., length info for
strings or size info for arrays). On the other hand, these few bytes are probably negligible
compared to the actual size of var-length data types.
    In fact, ``getMinLength()`` does not delegate to ``getLength()`` for var length data types
(if there's no bug). Forwarding -1 wouldn't make any sense.
    You're right, for CollectionDataSource sampling some elements would give better estimates,
but I though the ``getMinLength()`` could also be used for size estimation during optimization.

> CollectionDataSource does not provide statistics
> ------------------------------------------------
>                 Key: FLINK-1151
>                 URL: https://issues.apache.org/jira/browse/FLINK-1151
>             Project: Flink
>          Issue Type: Improvement
>          Components: Optimizer
>    Affects Versions: 0.6.1-incubating, 0.7-incubating
>            Reporter: Fabian Hueske
>            Assignee: Fabian Hueske
>            Priority: Minor
> CollectionDataSources do not provide statistics for the optimizer although the data type
and the number of elements are known.

This message was sent by Atlassian JIRA

View raw message