Mailing-List: contact user-help@flink.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@flink.apache.org
Received-SPF: error (nike.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: 
 <CAAdrtT3MOUnVDCeYcLv+HCBFwrsNQRJ3Z=-WJRsOqNYqa+55mQ@mail.gmail.com>
References: 
 <CAELUF_C800fiB=EJTbY6WrLGP3eGdx2nU44-qy_1tD3WowEo_A@mail.gmail.com>
 <CAAdrtT1iJ=z73DPiBB7Ft03fzXWFxPOxqqL1jiozywCXujeiZA@mail.gmail.com>
 <CAELUF_DUiTKcyPu9wX9oZp7-UVbkvJzz9rt7hhN3RkOvQinwUg@mail.gmail.com>
 <CAAdrtT3MOUnVDCeYcLv+HCBFwrsNQRJ3Z=-WJRsOqNYqa+55mQ@mail.gmail.com>
From: Flavio Pompermaier <pompermaier@okkam.it>
Date: Fri, 10 Apr 2015 12:26:28 +0200
Message-ID: 
 <CAELUF_AyYboVmLgtoA0_fJDNCHiPVW_NMXYWLj1QozOcnHrZxA@mail.gmail.com>
Subject: Re: Hadoop compatibility and HBase bulk loading
To: user <user@flink.apache.org>
Content-Type: multipart/alternative; boundary=f46d0442861cac61b105135c3370

--f46d0442861cac61b105135c3370
Content-Type: text/plain; charset=UTF-8

Great! That will be awesome.
Thank you Fabian

On Fri, Apr 10, 2015 at 12:14 PM, Fabian Hueske <fhueske@gmail.com> wrote:

> Hmm, that's a tricky question ;-) I would need to have a closer look. But
> getting custom comparators for sorting and grouping into the Combiner is
> not that trivial because it touches API, Optimizer, and Runtime code.
> However, I did that before for the Reducer and with the recent addition of
> groupCombine the Reducer changes might be just applied to combine.
>
> I'll be gone next week, but if you want to, we can have a closer look at
> the problem after that.
>
> 2015-04-10 12:07 GMT+02:00 Flavio Pompermaier <pompermaier@okkam.it>:
>
>> I think I could also take care of it if somebody can help me and guide me
>> a little bit..
>> How long do you think it will require to complete such a task?
>>
>> On Fri, Apr 10, 2015 at 12:02 PM, Fabian Hueske <fhueske@gmail.com>
>> wrote:
>>
>>> We had an effort to execute any HadoopMR program by simply specifying
>>> the JobConf and execute it (even embedded in regular Flink programs).
>>> We got quite far but did not complete (counters and custom grouping /
>>> sorting functions for Combiners are missing if I remember correctly).
>>> I don't think that anybody is working on that right now, but it would
>>> definitely be a cool feature.
>>>
>>> 2015-04-10 11:55 GMT+02:00 Flavio Pompermaier <pompermaier@okkam.it>:
>>>
>>>> Hi guys,
>>>>
>>>> I have a nice question about Hadoop compatibility.
>>>> In https://flink.apache.org/news/2014/11/18/hadoop-compatibility.html
>>>> you say that you can reuse existing mapreduce programs.
>>>> Could it be possible to manage also complex mapreduce programs like
>>>> HBase BulkImport that use for example a custom partioner
>>>> (org.apache.hadoop.mapreduce.Partitioner)?
>>>>
>>>> In the bulk-import examples the call
>>>> HFileOutputFormat2.configureIncrementalLoadMap that sets a series of job
>>>> parameters (like partitioner, mapper, reducers, etc) ->
>>>> http://pastebin.com/8VXjYAEf.
>>>> The full code of it can be seen at
>>>> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java
>>>> .
>>>>
>>>> Do you think there's any change to make it run in flink?
>>>>
>>>> Best,
>>>> Flavio
>>>>
>>>
>>>
>>
>

--f46d0442861cac61b105135c3370
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Great! That will be awesome.<div>Thank you Fabian</div><di=
v class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Fri, Apr 10, 2015=
 at 12:14 PM, Fabian Hueske <span dir=3D"ltr">&lt;<a href=3D"mailto:fhueske=
@gmail.com" target=3D"_blank">fhueske@gmail.com</a>&gt;</span> wrote:<br><b=
lockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px =
#ccc solid;padding-left:1ex"><div dir=3D"ltr"><div>Hmm, that&#39;s a tricky=
 question ;-) I would need to have a closer look. But getting custom compar=
ators for sorting and grouping into the Combiner is not that trivial becaus=
e it touches API, Optimizer, and Runtime code. However, I did that before f=
or the Reducer and with the recent addition of groupCombine the Reducer cha=
nges might be just applied to combine.<br><br></div>I&#39;ll be gone next w=
eek, but if you want to, we can have a closer look at the problem after tha=
t.<br></div><div class=3D"HOEnZb"><div class=3D"h5"><div class=3D"gmail_ext=
ra"><br><div class=3D"gmail_quote">2015-04-10 12:07 GMT+02:00 Flavio Pomper=
maier <span dir=3D"ltr">&lt;<a href=3D"mailto:pompermaier@okkam.it" target=
=3D"_blank">pompermaier@okkam.it</a>&gt;</span>:<br><blockquote class=3D"gm=
ail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-le=
ft:1ex"><div dir=3D"ltr">I think I could also take care of it if somebody c=
an help me and guide me a little bit..<div>How long do you think it will re=
quire to complete such a task?</div><div><div><div class=3D"gmail_extra"><b=
r><div class=3D"gmail_quote">On Fri, Apr 10, 2015 at 12:02 PM, Fabian Huesk=
e <span dir=3D"ltr">&lt;<a href=3D"mailto:fhueske@gmail.com" target=3D"_bla=
nk">fhueske@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_q=
uote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1e=
x"><div dir=3D"ltr"><div>We had an effort to execute any HadoopMR program b=
y simply specifying the JobConf and execute it (even embedded in regular Fl=
ink programs).<br></div>We got quite far but did not complete (counters and=
 custom grouping / sorting functions for Combiners are missing if I remembe=
r correctly). <br>I don&#39;t think that anybody is working on that right n=
ow, but it would definitely be a cool feature. <br></div><div><div><div cla=
ss=3D"gmail_extra"><br><div class=3D"gmail_quote">2015-04-10 11:55 GMT+02:0=
0 Flavio Pompermaier <span dir=3D"ltr">&lt;<a href=3D"mailto:pompermaier@ok=
kam.it" target=3D"_blank">pompermaier@okkam.it</a>&gt;</span>:<br><blockquo=
te class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc so=
lid;padding-left:1ex"><div dir=3D"ltr">Hi guys,<br><br>I have a nice questi=
on about Hadoop compatibility.<div>In <a href=3D"https://flink.apache.org/n=
ews/2014/11/18/hadoop-compatibility.html" target=3D"_blank">https://flink.a=
pache.org/news/2014/11/18/hadoop-compatibility.html</a> you say that you ca=
n reuse existing mapreduce programs.<br>Could it be possible to manage also=
 complex mapreduce programs like HBase BulkImport that use for example a cu=
stom partioner (org.apache.hadoop.mapreduce.Partitioner)?<br><div><br></div=
>In the bulk-import examples the call HFileOutputFormat2.configureIncrement=
alLoadMap that sets a series of job parameters (like partitioner, mapper, r=
educers, etc) -&gt; <a href=3D"http://pastebin.com/8VXjYAEf" target=3D"_bla=
nk">http://pastebin.com/8VXjYAEf</a>.<div>The full code of it can be seen a=
t=C2=A0<a href=3D"https://github.com/apache/hbase/blob/master/hbase-server/=
src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java" ta=
rget=3D"_blank">https://github.com/apache/hbase/blob/master/hbase-server/sr=
c/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java</a>.<=
/div><div><br><div><div>Do you think there&#39;s any change to make it run =
in flink?</div><div><br></div><div>Best,</div><div>Flavio</div></div></div>=
</div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><div><div dir=3D"ltr"><br><p></p><p></p><p><=
/p><p></p></div></div>
</div></div></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><div class=3D"gmail_signature"><div dir=3D"l=
tr"><br><p></p><p></p><p></p><p></p></div></div>
</div></div>

--f46d0442861cac61b105135c3370--