Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 769D7110E6 for ; Thu, 24 Jul 2014 20:24:27 +0000 (UTC) Received: (qmail 22418 invoked by uid 500); 24 Jul 2014 20:24:26 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 22327 invoked by uid 500); 24 Jul 2014 20:24:26 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 22310 invoked by uid 99); 24 Jul 2014 20:24:26 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Jul 2014 20:24:26 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of vladrodionov@gmail.com designates 74.125.82.180 as permitted sender) Received: from [74.125.82.180] (HELO mail-we0-f180.google.com) (74.125.82.180) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Jul 2014 20:24:21 +0000 Received: by mail-we0-f180.google.com with SMTP id w61so3338188wes.25 for ; Thu, 24 Jul 2014 13:24:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=rPylbmESqua+vHzM79Vq9hekdQWLFp7ra/1wV5tDlYA=; b=I8wkZ9BVg9qpTsKOFCBjBCErw764KrnXtf0chrM4h0NyO29zY6IKVrhDg2Qc5zDfPq 5zXU+Eem7WXjyolncPn1EM4qU8WK1bgJIA6+komavJ56V/7ABV9gQcsXf5AvYk0TE+r2 g2xToLDWcJkFH4pUFd/+d9gdfvN5prccQR+2yBYfTQVpi6qFK2RXZOvlfYzGWCVZmIVe K696TsMoQQihaStJbYTf8oU3NfCI3qitgdrp/HhzbPfqWKZVQFIx5QgstYNqRSKI/EDq D2JAIWdfyYq++PFreqLi+oeUtrC3nYBTyPlXo6E6RsPxm5wIJOJzAhMAhy0pj5GhxJWz 2QZw== MIME-Version: 1.0 X-Received: by 10.180.12.33 with SMTP id v1mr38601088wib.0.1406233439801; Thu, 24 Jul 2014 13:23:59 -0700 (PDT) Received: by 10.216.239.71 with HTTP; Thu, 24 Jul 2014 13:23:59 -0700 (PDT) In-Reply-To: References: Date: Thu, 24 Jul 2014 13:23:59 -0700 Message-ID: Subject: Re: how to do parallel scanning in map reduce using hbase as input? From: Vladimir Rodionov To: "dev@hbase.apache.org" Content-Type: multipart/alternative; boundary=001a11c24092a69a6504fef63ca6 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c24092a69a6504fef63ca6 Content-Type: text/plain; charset=UTF-8 I am working on improving inter-region scan performance and have the patch already. The patch will be committed as soon as all tests are done. This should improve M/R over HBase performance because now you will be able to create input splits with granularities lower than a region without loss of a performance. See : https://issues.apache.org/jira/browse/HBASE-7336 https://issues.apache.org/jira/browse/HBASE-5979 for more information on the subject. -Vladimir Rodionov On Tue, Jul 22, 2014 at 3:31 PM, Stack wrote: > On Mon, Jul 21, 2014 at 11:11 PM, Li Li wrote: > > > On Tue, Jul 22, 2014 at 1:57 PM, Stack wrote: > > > On Mon, Jul 21, 2014 at 10:53 PM, Li Li wrote: > > > > > >> Sorry, I enter tab and it send my unfinished post. See the following > > >> mail for answers of other questions. > > >> > > >> I forget the exception's detail. It throws exception in terminal. > > > > > > > > > What exception is thrown? > > I forget it. maybe I can retry it with 8 mapper configuration. it > > seems like out of memory exception > > > > > Who OOME'd? The map task or hbase? > > > > > > > > > > > > > > >> The > > >> default io.sort.mb is 100 and I set it to 500 to speed up reducer. > > > > > > > > > Do you have to have a reducer? If you could skip the shuffle... > > I have 8 reducers > > > > > Do you have to reduce? > > Would more reducers make your job run faster? > > > > > > > > > > > > > > >> So > > >> I set mapred.child.java.opts to 1g > > >> The datanode/regionserver has 16GB memory but free memory > > > > > > > > > Does the RS use the 16G? > > the RS use 8G and there are datanode and tasktracker in this machine > > > > > > > > How much for DN and TT? They don't need much usually. > > > > > > > > > > > >> for > > >> map-reduce is about 5gb. So I can't add more mappers > > >> > > >> > > >> How much RAM in these machines? > > 16GB > > > > These your machines or EC2? Can you get bigger machines if EC2? > > St.Ack > --001a11c24092a69a6504fef63ca6--