Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 19F7819006 for ; Tue, 12 Apr 2016 15:18:59 +0000 (UTC) Received: (qmail 26879 invoked by uid 500); 12 Apr 2016 15:18:57 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 26793 invoked by uid 500); 12 Apr 2016 15:18:57 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 26782 invoked by uid 99); 12 Apr 2016 15:18:57 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Apr 2016 15:18:57 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id B2A6CC0459 for ; Tue, 12 Apr 2016 15:18:56 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -4.996 X-Spam-Level: X-Spam-Status: No, score=-4.996 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RP_MATCHES_RCVD=-0.996] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id o1R3pe6nh9qu for ; Tue, 12 Apr 2016 15:18:54 +0000 (UTC) Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 28AD75F1AE for ; Tue, 12 Apr 2016 15:18:54 +0000 (UTC) X-IronPort-AV: E=Sophos;i="5.24,474,1454972400"; d="scan'208";a="213698399" Received: from zmbs6.inria.fr ([128.93.142.19]) by mail2-relais-roc.national.inria.fr with ESMTP; 12 Apr 2016 17:18:48 +0200 Date: Tue, 12 Apr 2016 17:18:47 +0200 (CEST) From: Ivan Cores gonzalez To: user@hbase.apache.org Message-ID: <1813175950.23398815.1460474327980.JavaMail.zimbra@inria.fr> In-Reply-To: References: <1567071767.23042954.1460378683769.JavaMail.zimbra@inria.fr> <1139842460.23047353.1460378931207.JavaMail.zimbra@inria.fr> Subject: Re: Processing rows in parallel with MapReduce jobs. MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Originating-IP: [194.199.27.235] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF45 (Linux)/8.0.9_GA_6191) Thread-Topic: Processing rows in parallel with MapReduce jobs. Thread-Index: 7FeM7tMpGe4e/Noxd4CMJwZLYW5ckQ== Hi Ted, Yes, I mean same region. I wasn't using the getSplits() function. I'm trying to add it to my code but I'm not sure how I have to do it. Is there any example in the website? I can not find anything. (By the way, I'm using TableInputFormat, not InputFormat) But just to confirm, with the getSplits() function, Are mappers processing rows in the same region executed in parallel? (assuming that there are empty processors/cores) Thanks, Ivan. ----- Mensaje original ----- > De: "Ted Yu" > Para: user@hbase.apache.org > Enviados: Lunes, 11 de Abril 2016 15:10:29 > Asunto: Re: Processing rows in parallel with MapReduce jobs. > > bq. if they are located in the same split? > > Probably you meant same region. > > Can you show the getSplits() for the InputFormat of your MapReduce job ? > > Thanks > > On Mon, Apr 11, 2016 at 5:48 AM, Ivan Cores gonzalez > wrote: > > > Hi all, > > > > I have a small question regarding the MapReduce jobs behaviour with HBase. > > > > I have a HBase test table with only 8 rows. I splitted the table with the > > hbase shell > > split command into 2 splits. So now there are 4 rows in every split. > > > > I create a MapReduce job that only prints the row key in the log files. > > When I run the MapReduce job, every row is processed by 1 mapper. But the > > mappers > > in the same split are executed sequentially (inside the same container). > > That means, > > the first four rows are processed sequentially by 4 mappers. The system > > has cores > > that are free, so is it possible to process rows in parallel if they are > > located > > in the same split? > > > > The only way I found to have 8 mappers executed in parallel is split the > > table > > in 8 splits (1 split per row). But obviously this is not the best solution > > for > > big tables ... > > > > Thanks, > > Ivan. > > >