flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Flavio Pompermaier (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3777) Add open and close methods to manage IF lifecycle
Date Tue, 07 Jun 2016 12:19:20 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15318374#comment-15318374
] 

Flavio Pompermaier commented on FLINK-3777:
-------------------------------------------

In our use case we have this very complex query that produce about 11 billions of records
and we did some benchmark in order to determine the perfect size of the splits.
That best split size happened to be around 100k (per query), because as you stated, there's
a trade-off between the complexity on the JobManager side but there's also a trade-off on
the database server capability to answer to wide range of keys. 
Splitting the entire key-set into just a small number of splits causes the job to die because
the queries never ends (i.e. timeout exceptions).

That was our "painful" experience..

> Add open and close methods to manage IF lifecycle
> -------------------------------------------------
>
>                 Key: FLINK-3777
>                 URL: https://issues.apache.org/jira/browse/FLINK-3777
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.0.1
>            Reporter: Flavio Pompermaier
>            Assignee: Flavio Pompermaier
>              Labels: inputformat, lifecycle
>
> At the moment the opening and closing of an inputFormat are not managed, although open()
could be (improperly IMHO) simulated by configure().
> This limits the possibility to reuse expensive resources (like database connections)
and manage their release. 
> Probably the best option would be to add 2 methods (i.e. openInputformat() and closeInputFormat()
) to RichInputFormat*
> * NOTE: the best option from a "semantic" point of view would be to rename the current
open() and close() to openSplit() and closeSplit() respectively while using open() and close()
methods for the IF lifecycle management, but this would cause a backward compatibility issue...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message