Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C1AF9D485 for ; Wed, 24 Oct 2012 05:11:52 +0000 (UTC) Received: (qmail 26502 invoked by uid 500); 24 Oct 2012 05:11:50 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 26464 invoked by uid 500); 24 Oct 2012 05:11:50 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 26456 invoked by uid 99); 24 Oct 2012 05:11:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Oct 2012 05:11:50 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of anoop.hbase@gmail.com designates 209.85.219.41 as permitted sender) Received: from [209.85.219.41] (HELO mail-oa0-f41.google.com) (209.85.219.41) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Oct 2012 05:11:43 +0000 Received: by mail-oa0-f41.google.com with SMTP id k14so135149oag.14 for ; Tue, 23 Oct 2012 22:11:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=4ao3SiBMh7C0A2XkEckHFFM//ggqz0+s1boKUxNN97g=; b=PqJJ+J1EUbPlwssHae+9NpKevO/50y6QXmV8y6i9MToXca01dtxgCt/jZb+YP69sQe jztW0WsZVgjQ88e3FqrrgOvCax5KpG0GiDAE7WjiaHqM2SrYxYJpwcKK/yeihqfHBMuj 4Y9pgO3h7mS/uRlJuPWfxdSV5SXa1d0Za54drczhrVPt7hs8foig2PMaw0FVXnWRk9oF qO6OP/Y0UwFEIcc/y2TixLm6V90ZlmoKcmcCw0Cysxhi7P/DzK0l7/FXVqCAwQG/doF0 +cM7H8sQcLl3WxvrsK7bKnUr7DFAIl77Uf2Fhff7jXlMcIThRIAzyE9wjyVYNGzpdHK8 4PMQ== MIME-Version: 1.0 Received: by 10.60.8.65 with SMTP id p1mr5844611oea.92.1351055482410; Tue, 23 Oct 2012 22:11:22 -0700 (PDT) Received: by 10.60.6.161 with HTTP; Tue, 23 Oct 2012 22:11:22 -0700 (PDT) In-Reply-To: References: Date: Wed, 24 Oct 2012 10:41:22 +0530 Message-ID: Subject: Re: Hbase import Tsv performance (slow import) From: Anoop John To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=e89a8ff1c56a19d6a604ccc71e7c X-Virus-Checked: Checked by ClamAV on apache.org --e89a8ff1c56a19d6a604ccc71e7c Content-Type: text/plain; charset=ISO-8859-1 Hi Anil On Wed, Oct 24, 2012 at 10:39 AM, anil gupta wrote: > Hi Anoop, > > As per your last email, did you mean that WAL is not used while using HBase > Bulk Loader? If yes, then how we ensure "no data loss" in case of > RegionServer failure? > > Thanks, > Anil Gupta > > On Tue, Oct 23, 2012 at 9:55 PM, ramkrishna vasudevan < > ramkrishna.s.vasudevan@gmail.com> wrote: > > > As Kevin suggested we can make use of bulk load that goes thro WAL and > > Memstore. Or the second option will be to use the o/p of mappers to > create > > HFiles directly. > > > > Regards > > Ram > > > > On Wed, Oct 24, 2012 at 8:59 AM, Anoop John > wrote: > > > > > Hi > > > Using ImportTSV tool you are trying to bulk load your data. Can you > > see > > > and tell how many mappers and reducers were there. Out of total time > what > > > is the time taken by the mapper phase and by the reducer phase. Seems > > like > > > MR related issue (may be some conf issue). In this bulk load case most > of > > > the work is done by the MR job. It will read the raw data and convert > it > > > into Puts and write to HFiles. MR o/p is HFiles itself. The next part > in > > > ImportTSV will just put the HFiles under the table region store.. > There > > > wont be WAL usage in this bulk load. > > > > > > -Anoop- > > > > > > On Tue, Oct 23, 2012 at 9:18 PM, Nick maillard < > > > nicolas.maillard@fifty-five.com> wrote: > > > > > > > Hi everyone > > > > > > > > I'm starting with hbase and testing for our needs. I have set up a > > hadoop > > > > cluster of Three machines and A Hbase cluster atop on the same three > > > > machines, > > > > one master two slaves. > > > > > > > > I am testing the Import of a 5GB csv file with the importTsv tool. I > > > > import the > > > > file in the HDFS and use the importTsv tool to import in Hbase. > > > > > > > > Right now it takes a little over an hour to complete. It creates > > around 2 > > > > million entries in one table with a single family. > > > > If I use bulk uploading it goes down to 20 minutes. > > > > > > > > My hadoop has 21 map tasks but they all seem to be taking a very long > > > time > > > > to > > > > finish many tasks end up in time out. > > > > > > > > I am wondering what I have missed in my configuration. I have > followed > > > the > > > > different prerequisites in the documentations but I am really unsure > as > > > to > > > > what > > > > is causing this slow down. If I were to apply the wordcount example > to > > > the > > > > same > > > > file it takes only minutes to complete so I am guessing the issue > lies > > in > > > > my > > > > Hbase configuration. > > > > > > > > Any help or pointers would by appreciated > > > > > > > > > > > > > > > > > -- > Thanks & Regards, > Anil Gupta > --e89a8ff1c56a19d6a604ccc71e7c--