flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chiwan Park <chiwanp...@apache.org>
Subject Re: Using Hadoop Input/Output formats
Date Wed, 25 Nov 2015 07:22:56 GMT
Thanks for correction @Fabian. :)

> On Nov 25, 2015, at 4:40 AM, Suneel Marthi <smarthi@apache.org> wrote:
> 
> Guess, it makes sense to add readHadoopXXX() methods to StreamExecutionEnvironment (for
feature parity with what's existing presently in ExecutionEnvironment).
> 
> Also Flink-2949 addresses the need to add relevant syntactic sugar wrappers in DataSet
api for the code snippet in Fabian's previous email. Its not cool, having to instantiate a
JobConf in client code and having to pass that around. 
> 
> 
> 
> On Tue, Nov 24, 2015 at 2:26 PM, Fabian Hueske <fhueske@gmail.com> wrote:
> Hi Nick,
> 
> you can use Flink's HadoopInputFormat wrappers also for the DataStream API. However,
DataStream does not offer as much "sugar" as DataSet because StreamEnvironment does not offer
dedicated createHadoopInput or readHadoopFile methods.
> 
> In DataStream Scala you can read from a Hadoop InputFormat (TextInputFormat in this case)
as follows:
> 
> val textData: DataStream[(LongWritable, Text)] = env.createInput(
>   new HadoopInputFormat[LongWritable, Text](
>     new TextInputFormat,
>     classOf[LongWritable],
>     classOf[Text],
>     new JobConf()
> ))
> 
> The Java version is very similar.
> 
> Note: Flink has wrappers for both MR APIs: mapred and mapreduce.
> 
> Cheers,
> Fabian
> 
> 2015-11-24 19:36 GMT+01:00 Chiwan Park <chiwanpark@apache.org>:
> I’m not streaming expert. AFAIK, the layer can be used with only DataSet. There are
some streaming-specific features such as distributed snapshot in Flink. These need some supports
of source and sink. So you have to implement I/O.
> 
> > On Nov 25, 2015, at 3:22 AM, Nick Dimiduk <ndimiduk@gmail.com> wrote:
> >
> > I completely missed this, thanks Chiwan. Can these be used with DataStreams as well
as DataSets?
> >
> > On Tue, Nov 24, 2015 at 10:06 AM, Chiwan Park <chiwanpark@apache.org> wrote:
> > Hi Nick,
> >
> > You can use Hadoop Input/Output Format without modification! Please check the documentation[1]
in Flink homepage.
> >
> > [1] https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/hadoop_compatibility.html
> >
> > > On Nov 25, 2015, at 3:04 AM, Nick Dimiduk <ndimiduk@apache.org> wrote:
> > >
> > > Hello,
> > >
> > > Is it possible to use existing Hadoop Input and OutputFormats with Flink? There's
a lot of existing code that conforms to these interfaces, seems a shame to have to re-implement
it all. Perhaps some adapter shim..?
> > >
> > > Thanks,
> > > Nick
> >
> > Regards,
> > Chiwan Park
> >
> >
> 
> Regards,
> Chiwan Park
> 

Regards,
Chiwan Park




Mime
View raw message