flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Pompermaier <pomperma...@okkam.it>
Subject Re: Parquet example
Date Wed, 12 Nov 2014 20:17:06 GMT
Yes I've read it!Will it support also hbase tableInputFormat (HTable and
Scan are no more serializable) ao basically the hbase addon becomes useless?
On Nov 12, 2014 9:10 PM, "Fabian Hueske" <fhueske@apache.org> wrote:

> Hi,
>
> just want to let you know, that we opened a JIRA (FLINK-1236) to support
> local split assignment for the HadoopInputFormat.
> At least this performance issue should be easy to solve :-)
>
> 2014-11-11 12:44 GMT+01:00 Fabian Hueske <fhueske@gmail.com>:
>
>> First of all, split locality can make a huge difference.
>> It will also enable a tighter integration, API-wise as well for the
>> execution by pushing for example filters or projections directly into the
>> data source and therefore reduce the data to be read from the file system.
>>
>> 2014-11-11 12:30 GMT+01:00 Flavio Pompermaier <pompermaier@okkam.it>:
>>
>>> Maybe this is a dumb question but could you explain me what are the
>>> benefits of a dedicated Flink IF vs the one available by default in Hadoop
>>> IF wrapper?
>>> Is it just because of data locality of task slots?
>>>
>>> On Tue, Nov 11, 2014 at 12:16 PM, Fabian Hueske <fhueske@apache.org>
>>> wrote:
>>>
>>>> Hi Flavio,
>>>>
>>>> I am not aware of a Flink InputFormat for Parquet. However, it should
>>>> be hopefully covered by the Hadoop IF wrapper.
>>>> A dedicated Flink IF would be great though, IMO.
>>>>
>>>> Best, Fabian
>>>>
>>>> 2014-11-11 12:10 GMT+01:00 Flavio Pompermaier <pompermaier@okkam.it>:
>>>>
>>>>> Hi to all,
>>>>>
>>>>> I'd like to know whether Flink is able exploit Parquet format to read
>>>>> data efficiently from HDFS.
>>>>> Is there any example available?
>>>>>
>>>>> Bets,
>>>>> Flavio
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message