Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DFBC5199F8 for ; Thu, 21 Apr 2016 14:32:07 +0000 (UTC) Received: (qmail 33139 invoked by uid 500); 21 Apr 2016 14:32:04 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 33070 invoked by uid 500); 21 Apr 2016 14:32:04 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 33059 invoked by uid 99); 21 Apr 2016 14:32:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Apr 2016 14:32:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 0639EC05EF for ; Thu, 21 Apr 2016 14:32:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -4.001 X-Spam-Level: X-Spam-Status: No, score=-4.001 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RP_MATCHES_RCVD=-0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id scyLCPYT4FZj for ; Thu, 21 Apr 2016 14:32:01 +0000 (UTC) Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 8A5D75F476 for ; Thu, 21 Apr 2016 14:32:00 +0000 (UTC) X-IronPort-AV: E=Sophos;i="5.24,513,1454972400"; d="scan'208";a="215299752" Received: from zmbs6.inria.fr ([128.93.142.19]) by mail2-relais-roc.national.inria.fr with ESMTP; 21 Apr 2016 16:31:54 +0200 Date: Thu, 21 Apr 2016 16:31:54 +0200 (CEST) From: Ivan Cores gonzalez To: user@hbase.apache.org Message-ID: <234997790.25872235.1461249114014.JavaMail.zimbra@inria.fr> In-Reply-To: References: <1567071767.23042954.1460378683769.JavaMail.zimbra@inria.fr> <1813175950.23398815.1460474327980.JavaMail.zimbra@inria.fr> <632124774.24734614.1460969657741.JavaMail.zimbra@inria.fr> <1846411371.25044788.1461049975409.JavaMail.zimbra@inria.fr> Subject: Re: Processing rows in parallel with MapReduce jobs. MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [194.199.27.235] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF45 (Linux)/8.0.9_GA_6191) Thread-Topic: Processing rows in parallel with MapReduce jobs. Thread-Index: m9yJCyavuoB5O4YgyhwL3FAVzkgX/Q== Thanks Ted,=20 Finally I found the real mistake, the class had to be declared static. Best, Iv=E1n. ----- Mensaje original ----- > De: "Ted Yu" > Para: user@hbase.apache.org > Enviados: Martes, 19 de Abril 2016 15:56:56 > Asunto: Re: Processing rows in parallel with MapReduce jobs. >=20 > From the error, you need to provide an argumentless ctor for > MyTableInputFormat. >=20 > On Tue, Apr 19, 2016 at 12:12 AM, Ivan Cores gonzalez > wrote: >=20 > > > > Hi Ted, > > > > Sorry, I forgot to write the error. In runtime I have the next exceptio= n: > > > > Exception in thread "main" java.lang.RuntimeException: > > java.lang.NoSuchMethodException: > > simplerowcounter.SimpleRowCounter$MyTableInputFormat.() > > > > the program works fine if I don't use "MyTableInputFormat" modifying th= e > > call to initTableMapperJob: > > > > TableMapReduceUtil.initTableMapperJob(tableName, scan, > > RowCounterMapper.class, > > ImmutableBytesWritable.class, Result.class, job); // = --> > > works fine without MyTableInputFormat > > > > That's why I asked If you see any problem in the code. Because maybe I > > forgot override some method or something is missing. > > > > Best, > > Iv=E1n. > > > > > > ----- Mensaje original ----- > > > De: "Ted Yu" > > > Para: user@hbase.apache.org > > > Enviados: Martes, 19 de Abril 2016 0:22:05 > > > Asunto: Re: Processing rows in parallel with MapReduce jobs. > > > > > > Did you see the " Message to log?" log ? > > > > > > Can you pastebin the error / exception you got ? > > > > > > On Mon, Apr 18, 2016 at 1:54 AM, Ivan Cores gonzalez < > > ivan.cores@inria.fr> > > > wrote: > > > > > > > > > > > > > > > Hi Ted, > > > > So, If I understand the behaviour of getSplits(), I can create > > "virtual" > > > > splits overriding the getSplits function. > > > > I was performing some tests, but my code crash in runtime and I can= not > > > > found the problem. > > > > Any help? I didn't find examples. > > > > > > > > > > > > public class SimpleRowCounter extends Configured implements Tool { > > > > > > > > static class RowCounterMapper extends > > > > TableMapper { > > > > public static enum Counters { ROWS } > > > > @Override > > > > public void map(ImmutableBytesWritable row, Result value, Conte= xt > > > > context) { > > > > context.getCounter(Counters.ROWS).increment(1); > > > > try { > > > > Thread.sleep(3000); //Simulates work > > > > } catch (InterruptedException name) { } > > > > } > > > > } > > > > > > > > public class MyTableInputFormat extends TableInputFormat { > > > > @Override > > > > public List getSplits(JobContext context) throws > > > > IOException { > > > > //Just to detect if this method is being called ... > > > > List splits =3D super.getSplits(context); > > > > System.out.printf(" Message to log? \n" ); > > > > return splits; > > > > } > > > > } > > > > > > > > @Override > > > > public int run(String[] args) throws Exception { > > > > if (args.length !=3D 1) { > > > > System.err.println("Usage: SimpleRowCounter "); > > > > return -1; > > > > } > > > > String tableName =3D args[0]; > > > > > > > > Scan scan =3D new Scan(); > > > > scan.setFilter(new FirstKeyOnlyFilter()); > > > > scan.setCaching(500); > > > > scan.setCacheBlocks(false); > > > > > > > > Job job =3D new Job(getConf(), getClass().getSimpleName()); > > > > job.setJarByClass(getClass()); > > > > > > > > TableMapReduceUtil.initTableMapperJob(tableName, scan, > > > > RowCounterMapper.class, > > > > ImmutableBytesWritable.class, Result.class, job, tr= ue, > > > > MyTableInputFormat.class); > > > > > > > > job.setNumReduceTasks(0); > > > > job.setOutputFormatClass(NullOutputFormat.class); > > > > return job.waitForCompletion(true) ? 0 : 1; > > > > } > > > > > > > > public static void main(String[] args) throws Exception { > > > > int exitCode =3D ToolRunner.run(HBaseConfiguration.create(), > > > > new SimpleRowCounter(), args); > > > > System.exit(exitCode); > > > > } > > > > } > > > > > > > > Thanks so much, > > > > Iv=E1n. > > > > > > > > > > > > > > > > > > > > ----- Mensaje original ----- > > > > > De: "Ted Yu" > > > > > Para: user@hbase.apache.org > > > > > Enviados: Martes, 12 de Abril 2016 17:29:52 > > > > > Asunto: Re: Processing rows in parallel with MapReduce jobs. > > > > > > > > > > Please take a look at TableInputFormatBase#getSplits() : > > > > > > > > > > * Calculates the splits that will serve as input for the map > > tasks. > > > > The > > > > > > > > > > * number of splits matches the number of regions in a table. > > > > > > > > > > Each mapper would be reading one of the regions. > > > > > > > > > > On Tue, Apr 12, 2016 at 8:18 AM, Ivan Cores gonzalez < > > > > ivan.cores@inria.fr> > > > > > wrote: > > > > > > > > > > > Hi Ted, > > > > > > Yes, I mean same region. > > > > > > > > > > > > I wasn't using the getSplits() function. I'm trying to add it t= o my > > > > code > > > > > > but I'm not sure how I have to do it. Is there any example in t= he > > > > website? > > > > > > I can not find anything. (By the way, I'm using TableInputForma= t, > > not > > > > > > InputFormat) > > > > > > > > > > > > But just to confirm, with the getSplits() function, Are mappers > > > > processing > > > > > > rows in the same region executed in parallel? (assuming that th= ere > > are > > > > > > empty > > > > > > processors/cores) > > > > > > > > > > > > Thanks, > > > > > > Ivan. > > > > > > > > > > > > > > > > > > ----- Mensaje original ----- > > > > > > > De: "Ted Yu" > > > > > > > Para: user@hbase.apache.org > > > > > > > Enviados: Lunes, 11 de Abril 2016 15:10:29 > > > > > > > Asunto: Re: Processing rows in parallel with MapReduce jobs. > > > > > > > > > > > > > > bq. if they are located in the same split? > > > > > > > > > > > > > > Probably you meant same region. > > > > > > > > > > > > > > Can you show the getSplits() for the InputFormat of your > > MapReduce > > > > job ? > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > On Mon, Apr 11, 2016 at 5:48 AM, Ivan Cores gonzalez < > > > > > > ivan.cores@inria.fr> > > > > > > > wrote: > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > I have a small question regarding the MapReduce jobs behavi= our > > with > > > > > > HBase. > > > > > > > > > > > > > > > > I have a HBase test table with only 8 rows. I splitted the > > table > > > > with > > > > > > the > > > > > > > > hbase shell > > > > > > > > split command into 2 splits. So now there are 4 rows in eve= ry > > > > split. > > > > > > > > > > > > > > > > I create a MapReduce job that only prints the row key in th= e > > log > > > > files. > > > > > > > > When I run the MapReduce job, every row is processed by 1 > > mapper. > > > > But > > > > > > the > > > > > > > > mappers > > > > > > > > in the same split are executed sequentially (inside the sam= e > > > > > > container). > > > > > > > > That means, > > > > > > > > the first four rows are processed sequentially by 4 mappers= . > > The > > > > system > > > > > > > > has cores > > > > > > > > that are free, so is it possible to process rows in paralle= l if > > > > they > > > > > > are > > > > > > > > located > > > > > > > > in the same split? > > > > > > > > > > > > > > > > The only way I found to have 8 mappers executed in parallel= is > > > > split > > > > > > the > > > > > > > > table > > > > > > > > in 8 splits (1 split per row). But obviously this is not th= e > > best > > > > > > solution > > > > > > > > for > > > > > > > > big tables ... > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Ivan. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >=20