flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Pompermaier <pomperma...@okkam.it>
Subject Re: Hadoop compatibility and HBase bulk loading
Date Fri, 10 Apr 2015 10:26:28 GMT
Great! That will be awesome.
Thank you Fabian

On Fri, Apr 10, 2015 at 12:14 PM, Fabian Hueske <fhueske@gmail.com> wrote:

> Hmm, that's a tricky question ;-) I would need to have a closer look. But
> getting custom comparators for sorting and grouping into the Combiner is
> not that trivial because it touches API, Optimizer, and Runtime code.
> However, I did that before for the Reducer and with the recent addition of
> groupCombine the Reducer changes might be just applied to combine.
> I'll be gone next week, but if you want to, we can have a closer look at
> the problem after that.
> 2015-04-10 12:07 GMT+02:00 Flavio Pompermaier <pompermaier@okkam.it>:
>> I think I could also take care of it if somebody can help me and guide me
>> a little bit..
>> How long do you think it will require to complete such a task?
>> On Fri, Apr 10, 2015 at 12:02 PM, Fabian Hueske <fhueske@gmail.com>
>> wrote:
>>> We had an effort to execute any HadoopMR program by simply specifying
>>> the JobConf and execute it (even embedded in regular Flink programs).
>>> We got quite far but did not complete (counters and custom grouping /
>>> sorting functions for Combiners are missing if I remember correctly).
>>> I don't think that anybody is working on that right now, but it would
>>> definitely be a cool feature.
>>> 2015-04-10 11:55 GMT+02:00 Flavio Pompermaier <pompermaier@okkam.it>:
>>>> Hi guys,
>>>> I have a nice question about Hadoop compatibility.
>>>> In https://flink.apache.org/news/2014/11/18/hadoop-compatibility.html
>>>> you say that you can reuse existing mapreduce programs.
>>>> Could it be possible to manage also complex mapreduce programs like
>>>> HBase BulkImport that use for example a custom partioner
>>>> (org.apache.hadoop.mapreduce.Partitioner)?
>>>> In the bulk-import examples the call
>>>> HFileOutputFormat2.configureIncrementalLoadMap that sets a series of job
>>>> parameters (like partitioner, mapper, reducers, etc) ->
>>>> http://pastebin.com/8VXjYAEf.
>>>> The full code of it can be seen at
>>>> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java
>>>> .
>>>> Do you think there's any change to make it run in flink?
>>>> Best,
>>>> Flavio

View raw message