flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@apache.org>
Subject Re: Parquet example
Date Wed, 12 Nov 2014 20:08:39 GMT
Hi,

just want to let you know, that we opened a JIRA (FLINK-1236) to support
local split assignment for the HadoopInputFormat.
At least this performance issue should be easy to solve :-)

2014-11-11 12:44 GMT+01:00 Fabian Hueske <fhueske@gmail.com>:

> First of all, split locality can make a huge difference.
> It will also enable a tighter integration, API-wise as well for the
> execution by pushing for example filters or projections directly into the
> data source and therefore reduce the data to be read from the file system.
>
> 2014-11-11 12:30 GMT+01:00 Flavio Pompermaier <pompermaier@okkam.it>:
>
>> Maybe this is a dumb question but could you explain me what are the
>> benefits of a dedicated Flink IF vs the one available by default in Hadoop
>> IF wrapper?
>> Is it just because of data locality of task slots?
>>
>> On Tue, Nov 11, 2014 at 12:16 PM, Fabian Hueske <fhueske@apache.org>
>> wrote:
>>
>>> Hi Flavio,
>>>
>>> I am not aware of a Flink InputFormat for Parquet. However, it should be
>>> hopefully covered by the Hadoop IF wrapper.
>>> A dedicated Flink IF would be great though, IMO.
>>>
>>> Best, Fabian
>>>
>>> 2014-11-11 12:10 GMT+01:00 Flavio Pompermaier <pompermaier@okkam.it>:
>>>
>>>> Hi to all,
>>>>
>>>> I'd like to know whether Flink is able exploit Parquet format to read
>>>> data efficiently from HDFS.
>>>> Is there any example available?
>>>>
>>>> Bets,
>>>> Flavio
>>>>
>>>
>>>
>>
>

Mime
View raw message