flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Hadoop compatibility and HBase bulk loading
Date Fri, 10 Apr 2015 10:14:42 GMT
Hmm, that's a tricky question ;-) I would need to have a closer look. But
getting custom comparators for sorting and grouping into the Combiner is
not that trivial because it touches API, Optimizer, and Runtime code.
However, I did that before for the Reducer and with the recent addition of
groupCombine the Reducer changes might be just applied to combine.

I'll be gone next week, but if you want to, we can have a closer look at
the problem after that.

2015-04-10 12:07 GMT+02:00 Flavio Pompermaier <pompermaier@okkam.it>:

> I think I could also take care of it if somebody can help me and guide me
> a little bit..
> How long do you think it will require to complete such a task?
> On Fri, Apr 10, 2015 at 12:02 PM, Fabian Hueske <fhueske@gmail.com> wrote:
>> We had an effort to execute any HadoopMR program by simply specifying the
>> JobConf and execute it (even embedded in regular Flink programs).
>> We got quite far but did not complete (counters and custom grouping /
>> sorting functions for Combiners are missing if I remember correctly).
>> I don't think that anybody is working on that right now, but it would
>> definitely be a cool feature.
>> 2015-04-10 11:55 GMT+02:00 Flavio Pompermaier <pompermaier@okkam.it>:
>>> Hi guys,
>>> I have a nice question about Hadoop compatibility.
>>> In https://flink.apache.org/news/2014/11/18/hadoop-compatibility.html
>>> you say that you can reuse existing mapreduce programs.
>>> Could it be possible to manage also complex mapreduce programs like
>>> HBase BulkImport that use for example a custom partioner
>>> (org.apache.hadoop.mapreduce.Partitioner)?
>>> In the bulk-import examples the call
>>> HFileOutputFormat2.configureIncrementalLoadMap that sets a series of job
>>> parameters (like partitioner, mapper, reducers, etc) ->
>>> http://pastebin.com/8VXjYAEf.
>>> The full code of it can be seen at
>>> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java
>>> .
>>> Do you think there's any change to make it run in flink?
>>> Best,
>>> Flavio

View raw message