Return-Path: X-Original-To: apmail-flink-dev-archive@www.apache.org Delivered-To: apmail-flink-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 74371177A8 for ; Mon, 3 Nov 2014 11:06:21 +0000 (UTC) Received: (qmail 14000 invoked by uid 500); 3 Nov 2014 11:06:21 -0000 Delivered-To: apmail-flink-dev-archive@flink.apache.org Received: (qmail 13943 invoked by uid 500); 3 Nov 2014 11:06:21 -0000 Mailing-List: contact dev-help@flink.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.incubator.apache.org Delivered-To: mailing list dev@flink.incubator.apache.org Received: (qmail 13927 invoked by uid 99); 3 Nov 2014 11:06:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Nov 2014 11:06:20 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.213.46] (HELO mail-yh0-f46.google.com) (209.85.213.46) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Nov 2014 11:06:16 +0000 Received: by mail-yh0-f46.google.com with SMTP id c41so220384yho.19 for ; Mon, 03 Nov 2014 03:05:55 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=DVX4P7And39Lec3kKJ5V21QGTbXwunVdx92CsvuOHQQ=; b=jMMhUrHihQiWlcKN55lrTGa8vjV9VzRfRv4wceXQNMIM6rNHXdQxOqGwHPJ796dLqh xZrUc0sxDzNBOMacIMADrlGcd7CwKhyVJxlNrjf4bpBdphHHNcsDSLCNrfpFW2ptV+pi rQpuWJpiVqT+DZqGMdAnCH5IAIn6Wanw5WmBPcNzGVkPxysX3NBLS3EMGy4gV9zOtf+l fc03aHlVmlZALptmLtWP00ZodfRBOHOzupMTsRutIvoV8FG+ZN2sbyeSfE7xETr2cYQz pa56uAZuOG1XLjZXF7tUYAYf5q+njxC1BSAPcYbR0a3DHyBn8lo/+QBEObEJZ68TlH/p T7LQ== X-Gm-Message-State: ALoCoQklnzedtUnkzUSCelizQ2Js3NEZ2KFm9c+QUFnC/1+WrTLnAbOLx7t3/c2pMAjaSTa+bpGy X-Received: by 10.236.74.199 with SMTP id x47mr215439yhd.180.1415012754837; Mon, 03 Nov 2014 03:05:54 -0800 (PST) MIME-Version: 1.0 Received: by 10.170.197.23 with HTTP; Mon, 3 Nov 2014 03:05:34 -0800 (PST) X-Originating-IP: [213.203.177.29] In-Reply-To: References: From: Flavio Pompermaier Date: Mon, 3 Nov 2014 12:05:34 +0100 Message-ID: Subject: Re: HBase 0.98 addon for Flink 0.8 To: dev@flink.incubator.apache.org Content-Type: multipart/alternative; boundary=20cf300faf6d9ae9c30506f2545d X-Virus-Checked: Checked by ClamAV on apache.org --20cf300faf6d9ae9c30506f2545d Content-Type: text/plain; charset=UTF-8 Thanks for the detailed answer. So if I run a job from my machine I'll have to download all the scanned data in a table..right? Always regarding the GenericTableOutputFormat it is not clear to me how to proceed.. I saw in the hadoop compatibility addon that it is possible to have such compatibility using HBaseUtils class so the open method should become something like: @Override public void open(int taskNumber, int numTasks) throws IOException { if (Integer.toString(taskNumber + 1).length() > 6) { throw new IOException("Task id too large."); } TaskAttemptID taskAttemptID = TaskAttemptID.forName("attempt__0000_r_" + String.format("%" + (6 - Integer.toString(taskNumber + 1).length()) + "s"," ").replace(" ", "0") + Integer.toString(taskNumber + 1) + "_0"); this.configuration.set("mapred.task.id", taskAttemptID.toString()); this.configuration.setInt("mapred.task.partition", taskNumber + 1); // for hadoop 2.2 this.configuration.set("mapreduce.task.attempt.id", taskAttemptID.toString()); this.configuration.setInt("mapreduce.task.partition", taskNumber + 1); try { this.context = HadoopUtils.instantiateTaskAttemptContext(this.configuration, taskAttemptID); } catch (Exception e) { throw new RuntimeException(e); } final HFileOutputFormat2 outFormat = new HFileOutputFormat2(); try { this.writer = outFormat.getRecordWriter(this.context); } catch (InterruptedException iex) { throw new IOException("Opening the writer was interrupted.", iex); } } But I'm not sure about how to pass the JobConf to the class, if to merge config fileas, where HFileOutputFormat2 writes the data and how to implement the public void writeRecord(Record record) API. Could I do a little chat off the mailing list with the implementor of this extension? On Mon, Nov 3, 2014 at 11:51 AM, Fabian Hueske wrote: > Hi Flavio > > let me try to answer your last question on the user's list (to the best of > my HBase knowledge). > "I just wanted to known if and how regiom splitting is handled. Can you > explain me in detail how Flink and HBase works?what is not fully clear to > me is when computation is done by region servers and when data start flow > to a Flink worker (that in ky test job is only my pc) and how ro undertsand > better the important logged info to understand if my job is performing > well" > > HBase partitions its tables into so called "regions" of keys and stores the > regions distributed in the cluster using HDFS. I think an HBase region can > be thought of as a HDFS block. To make reading an HBase table efficient, > region reads should be locally done, i.e., an InputFormat should primarily > read region that are stored on the same machine as the IF is running on. > Flink's InputSplits partition the HBase input by regions and add > information about the storage location of the region. During execution, > input splits are assigned to InputFormats that can do local reads. > > Best, Fabian > > 2014-11-03 11:13 GMT+01:00 Stephan Ewen : > > > Hi! > > > > The way of passing parameters through the configuration is very old (the > > original HBase format dated back to that time). I would simply make the > > HBase format take those parameters through the constructor. > > > > Greetings, > > Stephan > > > > > > On Mon, Nov 3, 2014 at 10:59 AM, Flavio Pompermaier < > pompermaier@okkam.it> > > wrote: > > > > > The problem is that I also removed the GenericTableOutputFormat because > > > there is an incompatibility between hadoop1 and hadoop2 for class > > > TaskAttemptContext and TaskAttemptContextImpl.. > > > then it would be nice if the user doesn't have to worry about passing > > > pact.hbase.jtkey and pact.job.id parameters.. > > > I think it is probably a good idea to remove hadoop1 compatibility and > > keep > > > enable HBase addon only for hadoop2 (as before) and decide how to mange > > > those 2 parameters.. > > > > > > On Mon, Nov 3, 2014 at 10:19 AM, Stephan Ewen > wrote: > > > > > > > It is fine to remove it, in my opinion. > > > > > > > > On Mon, Nov 3, 2014 at 10:11 AM, Flavio Pompermaier < > > > pompermaier@okkam.it> > > > > wrote: > > > > > > > > > That is one class I removed because it was using the deprecated API > > > > > GenericDataSink..I can restore them but the it will be a good idea > to > > > > > remove those warning (also because from what I understood the > Record > > > APIs > > > > > are going to be removed). > > > > > > > > > > On Mon, Nov 3, 2014 at 9:51 AM, Fabian Hueske > > > > wrote: > > > > > > > > > > > I'm not familiar with the HBase connector code, but are you maybe > > > > looking > > > > > > for the GenericTableOutputFormat? > > > > > > > > > > > > 2014-11-03 9:44 GMT+01:00 Flavio Pompermaier < > pompermaier@okkam.it > > >: > > > > > > > > > > > > > | was trying to modify the example setting hbaseDs.output(new > > > > > > > HBaseOutputFormat()); but I can't see any HBaseOutputFormat > > > > > class..maybe > > > > > > we > > > > > > > shall use another class? > > > > > > > > > > > > > > On Mon, Nov 3, 2014 at 9:39 AM, Flavio Pompermaier < > > > > > pompermaier@okkam.it > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Maybe that's something I could add to the HBase example and > > that > > > > > could > > > > > > be > > > > > > > > better documented in the Wiki. > > > > > > > > > > > > > > > > Since we're talking about the wiki..I was looking at the Java > > > API ( > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://flink.incubator.apache.org/docs/0.6-incubating/java_api_guide.html) > > > > > > > > and the link to the KMeans example is not working (where it > > says > > > > For > > > > > a > > > > > > > > complete example program, have a look at KMeans Algorithm). > > > > > > > > > > > > > > > > Best, > > > > > > > > Flavio > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Nov 3, 2014 at 9:12 AM, Flavio Pompermaier < > > > > > > pompermaier@okkam.it > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > >> Ah ok, perfect! That was the reason why I removed it :) > > > > > > > >> > > > > > > > >> On Mon, Nov 3, 2014 at 9:10 AM, Stephan Ewen < > > sewen@apache.org> > > > > > > wrote: > > > > > > > >> > > > > > > > >>> You do not really need a HBase data sink. You can call > > > > > > > >>> "DataSet.output(new > > > > > > > >>> HBaseOutputFormat())" > > > > > > > >>> > > > > > > > >>> Stephan > > > > > > > >>> Am 02.11.2014 23:05 schrieb "Flavio Pompermaier" < > > > > > > pompermaier@okkam.it > > > > > > > >: > > > > > > > >>> > > > > > > > >>> > Just one last thing..I removed the HbaseDataSink because > I > > > > think > > > > > it > > > > > > > was > > > > > > > >>> > using the old APIs..can someone help me in updating that > > > class? > > > > > > > >>> > > > > > > > > >>> > On Sun, Nov 2, 2014 at 10:55 AM, Flavio Pompermaier < > > > > > > > >>> pompermaier@okkam.it> > > > > > > > >>> > wrote: > > > > > > > >>> > > > > > > > > >>> > > Indeed this time the build has been successful :) > > > > > > > >>> > > > > > > > > > >>> > > On Sun, Nov 2, 2014 at 10:29 AM, Fabian Hueske < > > > > > > fhueske@apache.org > > > > > > > > > > > > > > > >>> > wrote: > > > > > > > >>> > > > > > > > > > >>> > >> You can also setup Travis to build your own Github > > > > > repositories > > > > > > by > > > > > > > >>> > linking > > > > > > > >>> > >> it to your Github account. That way Travis can build > all > > > > your > > > > > > > >>> branches > > > > > > > >>> > >> (and > > > > > > > >>> > >> you can also trigger rebuilds if something fails). > > > > > > > >>> > >> Not sure if we can manually trigger retrigger builds > on > > > the > > > > > > Apache > > > > > > > >>> > >> repository. > > > > > > > >>> > >> > > > > > > > >>> > >> Support for Hadoop 1 and 2 is indeed a very good > > addition > > > > :-) > > > > > > > >>> > >> > > > > > > > >>> > >> For the discusion about the PR itself, I would need a > > bit > > > > more > > > > > > > time > > > > > > > >>> to > > > > > > > >>> > >> become more familiar with HBase. I do also not have a > > > HBase > > > > > > setup > > > > > > > >>> > >> available > > > > > > > >>> > >> here. > > > > > > > >>> > >> Maybe somebody else of the community who was involved > > > with a > > > > > > > >>> previous > > > > > > > >>> > >> version of the HBase connector could comment on your > > > > question. > > > > > > > >>> > >> > > > > > > > >>> > >> Best, Fabian > > > > > > > >>> > >> > > > > > > > >>> > >> 2014-11-02 9:57 GMT+01:00 Flavio Pompermaier < > > > > > > > pompermaier@okkam.it > > > > > > > >>> >: > > > > > > > >>> > >> > > > > > > > >>> > >> > As suggestes by Fabian I moved the discussion on > this > > > > > mailing > > > > > > > >>> list. > > > > > > > >>> > >> > > > > > > > > >>> > >> > I think that what is still to be discussed is how > to > > > > > > retrigger > > > > > > > >>> the > > > > > > > >>> > >> build > > > > > > > >>> > >> > on Travis (I don't have an account) and if the PR > can > > be > > > > > > > >>> integrated. > > > > > > > >>> > >> > > > > > > > > >>> > >> > Maybe what I can do is to move the HBase example in > > the > > > > test > > > > > > > >>> package > > > > > > > >>> > >> (right > > > > > > > >>> > >> > now I left it in the main folder) so it will force > > > Travis > > > > to > > > > > > > >>> rebuild. > > > > > > > >>> > >> > I'll do it within a couple of hours. > > > > > > > >>> > >> > > > > > > > > >>> > >> > Another thing I forgot to say is that the hbase > > > extension > > > > is > > > > > > now > > > > > > > >>> > >> compatible > > > > > > > >>> > >> > with both hadoop 1 and 2. > > > > > > > >>> > >> > > > > > > > > >>> > >> > Best, > > > > > > > >>> > >> > Flavio > > > > > > > >>> > >> > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --20cf300faf6d9ae9c30506f2545d--