flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suneel Marthi <suneel.mar...@gmail.com>
Subject Re: Reading Binary Data (Matrix) with Flink
Date Sun, 24 Jan 2016 18:10:41 GMT
There should be a env.readbinaryfile() IIRC, check that

Sent from my iPhone

> On Jan 24, 2016, at 12:44 PM, Saliya Ekanayake <esaliya@gmail.com> wrote:
> 
> Thank you for the response on this, but I still have some doubt. Simply, the files is
not in HDFS, it's in local storage. In Flink if I run the program with, say 5 parallel tasks,
what I would like to do is to read a block of rows in each task as shown below. I looked at
the simple CSV reader and was thinking to create a custom one like that, but I would need
to know the task number to read the relevant block. Is this possible?
> 
> <image.png>
> 
> Thank you,
> Saliya
> 
>> On Wed, Jan 20, 2016 at 12:47 PM, Till Rohrmann <trohrmann@apache.org> wrote:
>> With readHadoopFile you can use all of Hadoop’s FileInputFormats and thus you can
also do everything with Flink, what you can do with Hadoop. Simply take the same Hadoop FileInputFormat
which you would take for your MapReduce job.
>> 
>> Cheers,
>> Till
>> 
>> 
>>> On Wed, Jan 20, 2016 at 3:16 PM, Saliya Ekanayake <esaliya@gmail.com> wrote:
>>> Thank you, I saw the readHadoopFile, but I was not sure how it can be used to
the following, which is what I need. The logic of the code requires an entire row to operate
on, so in our current implementation with P tasks, each of them will read a rectangular block
of (N/P) x N from the matrix. Is this possible with readHadoopFile? Also, the file may not
be in hdfs, so is it possible to refer to local disk in doing this?
>>> 
>>> Thank you
>>> 
>>>> On Wed, Jan 20, 2016 at 1:31 AM, Chiwan Park <chiwanpark@apache.org>
wrote:
>>>> Hi Saliya,
>>>> 
>>>> You can use the input format from Hadoop in Flink by using readHadoopFile
method. The method returns a dataset which of type is Tuple2<Key, Value>. Note that
MapReduce equivalent transformation in Flink is composed of map, groupBy, and reduceGroup.
>>>> 
>>>> > On Jan 20, 2016, at 3:04 PM, Suneel Marthi <smarthi@apache.org>
wrote:
>>>> >
>>>> > Guess u r looking for Flink's BinaryInputFormat to be able to read blocks
of data from HDFS
>>>> >
>>>> > https://ci.apache.org/projects/flink/flink-docs-release-0.10/api/java/org/apache/flink/api/common/io/BinaryInputFormat.html
>>>> >
>>>> > On Wed, Jan 20, 2016 at 12:45 AM, Saliya Ekanayake <esaliya@gmail.com>
wrote:
>>>> > Hi,
>>>> >
>>>> > I am trying to use Flink perform a parallel batch operation on a NxN
matrix represented as a binary file. Each (i,j) element is stored as a Java Short value. In
a typical MapReduce programming with Hadoop, each map task will read a block of rows of this
matrix and perform computation on that block and emit result to the reducer.
>>>> >
>>>> > How is this done in Flink? I am new to Flink and couldn't find a binary
reader so far. Any help is greatly appreciated.
>>>> >
>>>> > Thank you,
>>>> > Saliya
>>>> >
>>>> > --
>>>> > Saliya Ekanayake
>>>> > Ph.D. Candidate | Research Assistant
>>>> > School of Informatics and Computing | Digital Science Center
>>>> > Indiana University, Bloomington
>>>> > Cell 812-391-4914
>>>> > http://saliya.org
>>>> >
>>>> 
>>>> Regards,
>>>> Chiwan Park
>>> 
>>> 
>>> 
>>> -- 
>>> Saliya Ekanayake
>>> Ph.D. Candidate | Research Assistant
>>> School of Informatics and Computing | Digital Science Center
>>> Indiana University, Bloomington
>>> Cell 812-391-4914
>>> http://saliya.org
> 
> 
> 
> -- 
> Saliya Ekanayake
> Ph.D. Candidate | Research Assistant
> School of Informatics and Computing | Digital Science Center
> Indiana University, Bloomington
> Cell 812-391-4914
> http://saliya.org

Mime
View raw message