flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: [DISCUSS] Removal of twitter-inputformat
Date Mon, 12 Jun 2017 12:18:03 GMT
Bumpety-bump.

I would be in favour or removing this:
 - It can be implemented as a MapFunction parser after a TextInputFormat
 - Additions, changes, fixes that happen on TextInputFormat are not reflected to SimpleTweetInputFormat
 - SimpleTweetInput format overrides nextRecord(), which is not something DelimitedInputFormats
are normally supposed to do, I think
 - The Tweet POJO has a very strange naming scheme

Best,
Aljoscha

> On 7. Jun 2017, at 11:15, Chesnay Schepler <chesnay@apache.org> wrote:
> 
> Hello,
> 
> I'm proposing to remove the Twitter-InputFormat in FLINK-6710 <https://issues.apache.org/jira/browse/FLINK-6710>,
with an open PR you can find here <https://github.com/apache/flink/pull/3984>.
> The PR currently has a +1 from Robert, but Timo raised some concerns saying that it is
useful for prototyping and
> advised me to start a discussion on the ML.
> 
> This format is a DelimitedInputFormat that reads JSON objects and turns them into a custom
tweet class.
> I believe this format doesn't provide much value to Flink; there's nothing interesting
about it as an InputFormat,
> as it is purely an exercise in manually converting a JSON object into a POJO.
> This is apparent since you could just as well use ExecutionEnvironment#readTextFile(...)
and throw the parsing logic
> into a subsequent MapFunction.
> 
> In the PR i suggested to replace this with a JsonInputFormat, but this was a misguided
attempt at getting Timo to agree
> to the removal. This format has the same problem outlined above, as it could be effectively
implemented with a one-liner map function.
> 
> So the question now is whether we want to keep it, remove it, or replace it with something
more general.
> 
> Regards,
> Chesnay


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message