Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1B5A817B8C for ; Fri, 10 Apr 2015 10:28:20 +0000 (UTC) Received: (qmail 61804 invoked by uid 500); 10 Apr 2015 10:28:20 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 61733 invoked by uid 500); 10 Apr 2015 10:28:19 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 61723 invoked by uid 99); 10 Apr 2015 10:28:19 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Apr 2015 10:28:19 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW X-Spam-Check-By: apache.org Received-SPF: error (nike.apache.org: local policy) Received: from [74.125.82.49] (HELO mail-wg0-f49.google.com) (74.125.82.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Apr 2015 10:27:54 +0000 Received: by wgsk9 with SMTP id k9so13243044wgs.3 for ; Fri, 10 Apr 2015 03:26:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=RNTkuoyBcOZ+LM8HYzeW2NDHD0FWVElE6YnClIpqiME=; b=g9CFeudz3pAY6V97V2Oc4h+laf5orC1bFzgbioQS30NJfgwy4YOaIj8JDRZEkUlOl9 0Z81yMzkbRyGAnbjU+zYn2tfUFXueQzPzvpKaugpMwIR/zfePNBdVHzeQ5nMoTwLcxgW 2LNomndBOiNezJG68JPz6BvIU5s/n8q35/hX755I99YD2oj47V5PyfIa8i2FkP4HyvQx XxAcns3IkmXNlqW6U6DkRc9JGTJj13p8z0muE//6vXvv0ZBLsCICARArZRyrEab7GkRH 5wFdQESlqflXpYgGJpN1wpcjWwSfAPcyQMbi1sG8rURe+Bq1kkF6VeTheug8VaO6L4vw ciFQ== X-Gm-Message-State: ALoCoQlTwLt27Mh58PVzYpGVyHYt9+TmseTey+JA4Opf8IDzLgXxEMYNLkWvBuE4PzrI71J555Pn X-Received: by 10.180.86.201 with SMTP id r9mr14014385wiz.56.1428661608401; Fri, 10 Apr 2015 03:26:48 -0700 (PDT) MIME-Version: 1.0 Received: by 10.27.32.9 with HTTP; Fri, 10 Apr 2015 03:26:28 -0700 (PDT) X-Originating-IP: [213.203.177.29] In-Reply-To: References: From: Flavio Pompermaier Date: Fri, 10 Apr 2015 12:26:28 +0200 Message-ID: Subject: Re: Hadoop compatibility and HBase bulk loading To: user Content-Type: multipart/alternative; boundary=f46d0442861cac61b105135c3370 X-Virus-Checked: Checked by ClamAV on apache.org --f46d0442861cac61b105135c3370 Content-Type: text/plain; charset=UTF-8 Great! That will be awesome. Thank you Fabian On Fri, Apr 10, 2015 at 12:14 PM, Fabian Hueske wrote: > Hmm, that's a tricky question ;-) I would need to have a closer look. But > getting custom comparators for sorting and grouping into the Combiner is > not that trivial because it touches API, Optimizer, and Runtime code. > However, I did that before for the Reducer and with the recent addition of > groupCombine the Reducer changes might be just applied to combine. > > I'll be gone next week, but if you want to, we can have a closer look at > the problem after that. > > 2015-04-10 12:07 GMT+02:00 Flavio Pompermaier : > >> I think I could also take care of it if somebody can help me and guide me >> a little bit.. >> How long do you think it will require to complete such a task? >> >> On Fri, Apr 10, 2015 at 12:02 PM, Fabian Hueske >> wrote: >> >>> We had an effort to execute any HadoopMR program by simply specifying >>> the JobConf and execute it (even embedded in regular Flink programs). >>> We got quite far but did not complete (counters and custom grouping / >>> sorting functions for Combiners are missing if I remember correctly). >>> I don't think that anybody is working on that right now, but it would >>> definitely be a cool feature. >>> >>> 2015-04-10 11:55 GMT+02:00 Flavio Pompermaier : >>> >>>> Hi guys, >>>> >>>> I have a nice question about Hadoop compatibility. >>>> In https://flink.apache.org/news/2014/11/18/hadoop-compatibility.html >>>> you say that you can reuse existing mapreduce programs. >>>> Could it be possible to manage also complex mapreduce programs like >>>> HBase BulkImport that use for example a custom partioner >>>> (org.apache.hadoop.mapreduce.Partitioner)? >>>> >>>> In the bulk-import examples the call >>>> HFileOutputFormat2.configureIncrementalLoadMap that sets a series of job >>>> parameters (like partitioner, mapper, reducers, etc) -> >>>> http://pastebin.com/8VXjYAEf. >>>> The full code of it can be seen at >>>> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java >>>> . >>>> >>>> Do you think there's any change to make it run in flink? >>>> >>>> Best, >>>> Flavio >>>> >>> >>> >> > --f46d0442861cac61b105135c3370 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Great! That will be awesome.
Thank you Fabian

On Fri, Apr 10, 2015= at 12:14 PM, Fabian Hueske <fhueske@gmail.com> wrote:
Hmm, that's a tricky= question ;-) I would need to have a closer look. But getting custom compar= ators for sorting and grouping into the Combiner is not that trivial becaus= e it touches API, Optimizer, and Runtime code. However, I did that before f= or the Reducer and with the recent addition of groupCombine the Reducer cha= nges might be just applied to combine.

I'll be gone next w= eek, but if you want to, we can have a closer look at the problem after tha= t.

2015-04-10 12:07 GMT+02:00 Flavio Pomper= maier <pompermaier@okkam.it>:
I think I could also take care of it if somebody c= an help me and guide me a little bit..
How long do you think it will re= quire to complete such a task?
On Fri, Apr 10, 2015 at 12:02 PM, Fabian Huesk= e <fhueske@gmail.com> wrote:
We had an effort to execute any HadoopMR program b= y simply specifying the JobConf and execute it (even embedded in regular Fl= ink programs).
We got quite far but did not complete (counters and= custom grouping / sorting functions for Combiners are missing if I remembe= r correctly).
I don't think that anybody is working on that right n= ow, but it would definitely be a cool feature.

2015-04-10 11:55 GMT+02:0= 0 Flavio Pompermaier <pompermaier@okkam.it>:
Hi guys,

I have a nice questi= on about Hadoop compatibility.
In https://flink.a= pache.org/news/2014/11/18/hadoop-compatibility.html you say that you ca= n reuse existing mapreduce programs.
Could it be possible to manage also= complex mapreduce programs like HBase BulkImport that use for example a cu= stom partioner (org.apache.hadoop.mapreduce.Partitioner)?

In the bulk-import examples the call HFileOutputFormat2.configureIncrement= alLoadMap that sets a series of job parameters (like partitioner, mapper, r= educers, etc) -> http://pastebin.com/8VXjYAEf.
The full code of it can be seen a= t=C2=A0https://github.com/apache/hbase/blob/master/hbase-server/sr= c/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java.<= /div>

Do you think there's any change to make it run = in flink?

Best,
Flavio
=


<= /p>



--f46d0442861cac61b105135c3370--