flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Error while reading binary file
Date Mon, 08 Feb 2016 21:37:56 GMT
The SerializedInputFormat extends the BinaryInputFormat which expects a
special block-wise encoding and certain metadata fields.
It is not suited to read arbitrary binary files such as a file with 64
short values.
I suggest to implement a custom input format based on FileInputFormat.

Best, Fabian

2016-02-08 22:05 GMT+01:00 Saliya Ekanayake <esaliya@gmail.com>:

> Thank you, Fabian. It solved the compilation error, but at runtime I get
> an end-of-file exception. I've put up a sample code with data at Github
> https://github.com/esaliya/flinkit. The data file is a binary file
> containing 64 Short values.
>
>
> 02/08/2016 16:01:19 CHAIN DataSource (at main(WordCount.java:25)
> (org.apache.flink.api.common.io.SerializedInputFormat)) -> FlatMap
> (count())(4/8) switched to FAILED
> java.io.EOFException
> at java.io.DataInputStream.readShort(DataInputStream.java:315)
> at
> org.apache.flink.core.memory.InputViewDataInputStreamWrapper.readShort(InputViewDataInputStreamWrapper.java:92)
> at org.apache.flink.types.ShortValue.read(ShortValue.java:88)
> at
> org.apache.flink.api.common.io.SerializedInputFormat.deserialize(SerializedInputFormat.java:37)
> at
> org.apache.flink.api.common.io.SerializedInputFormat.deserialize(SerializedInputFormat.java:31)
> at
> org.apache.flink.api.common.io.BinaryInputFormat.nextRecord(BinaryInputFormat.java:274)
> at
> org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:169)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
> at java.lang.Thread.run(Thread.java:745)
>
> On Mon, Feb 8, 2016 at 3:50 PM, Fabian Hueske <fhueske@gmail.com> wrote:
>
>> Hi,
>>
>> please try to replace
>> DataSet<ShortValue> ds = env.createInput(sif);
>> by
>> DataSet<ShortValue> ds = env.createInput(sif,
>> ValueTypeInfo.SHORT_VALUE_TYPE_INFO);
>>
>> Best, Fabian
>>
>> 2016-02-08 19:33 GMT+01:00 Saliya Ekanayake <esaliya@gmail.com>:
>>
>>> Till,
>>>
>>> I am still having trouble getting this to work. Here's my code (
>>> https://github.com/esaliya/flinkit)
>>>
>>> String binaryFile = "src/main/resources/sample.bin";
>>> SerializedInputFormat<ShortValue> sif = new SerializedInputFormat<>();
>>> sif.setFilePath(binaryFile);
>>> DataSet<ShortValue> ds = env.createInput(sif);
>>> System.out.println(ds.count());
>>>
>>>
>>> I still get the same error as shown below
>>>
>>> Exception in thread "main"
>>> org.apache.flink.api.common.InvalidProgramException: The type returned by
>>> the input format could not be automatically determined. Please specify the
>>> TypeInformation of the produced type explicitly by using the
>>> 'createInput(InputFormat, TypeInformation)' method instead.
>>> at
>>> org.apache.flink.api.java.ExecutionEnvironment.createInput(ExecutionEnvironment.java:511)
>>> at org.saliya.flinkit.WordCount.main(WordCount.java:24)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:497)
>>> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
>>>
>>>
>>> On Mon, Feb 8, 2016 at 5:42 AM, Till Rohrmann <trohrmann@apache.org>
>>> wrote:
>>>
>>>> Hi Saliya,
>>>>
>>>> in order to set the file path for the SerializedInputFormat you first
>>>> have to create it and then explicitly call setFilePath.
>>>>
>>>> final SerializedInputFormat<Record> inputFormat = new SerializedInputFormat<Record>();
>>>> inputFormat.setFilePath(PATH_TO_FILE);
>>>>
>>>> env.createInput(inputFormat, myTypeInfo);
>>>>
>>>> Cheers,
>>>> Till
>>>> ‚Äč
>>>>
>>>> On Mon, Feb 8, 2016 at 7:00 AM, Saliya Ekanayake <esaliya@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I was trying to read a simple binary file using SerializedInputFormat
>>>>> as suggested in a different thread, but encounters the following error.
I
>>>>> tried to do what the exception suggests, but eventhough createInput()
>>>>> returns a DataSet object I couldn't find how to specify which file to
read.
>>>>>
>>>>> Any help is appreciated. The file I am trying to read is a simple
>>>>> binary file with containing java short values. Is there any example on
>>>>> reading binary files available?
>>>>>
>>>>> Exception in thread "main"
>>>>> org.apache.flink.api.common.InvalidProgramException: The type returned
by
>>>>> the input format could not be automatically determined. Please specify
the
>>>>> TypeInformation of the produced type explicitly by using the
>>>>> 'createInput(InputFormat, TypeInformation)' method instead.
>>>>>
>>>>> Thank you,
>>>>> Saliya
>>>>>
>>>>>
>>>>> --
>>>>> Saliya Ekanayake
>>>>> Ph.D. Candidate | Research Assistant
>>>>> School of Informatics and Computing | Digital Science Center
>>>>> Indiana University, Bloomington
>>>>> Cell 812-391-4914
>>>>> http://saliya.org
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Saliya Ekanayake
>>> Ph.D. Candidate | Research Assistant
>>> School of Informatics and Computing | Digital Science Center
>>> Indiana University, Bloomington
>>> Cell 812-391-4914
>>> http://saliya.org
>>>
>>
>>
>
>
> --
> Saliya Ekanayake
> Ph.D. Candidate | Research Assistant
> School of Informatics and Computing | Digital Science Center
> Indiana University, Bloomington
> Cell 812-391-4914
> http://saliya.org
>

Mime
View raw message