Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6D942D641 for ; Fri, 26 Oct 2012 04:09:46 +0000 (UTC) Received: (qmail 63083 invoked by uid 500); 26 Oct 2012 04:09:43 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 62824 invoked by uid 500); 26 Oct 2012 04:09:42 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 62793 invoked by uid 99); 26 Oct 2012 04:09:41 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Oct 2012 04:09:41 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of anoopsj@huawei.com designates 119.145.14.65 as permitted sender) Received: from [119.145.14.65] (HELO szxga02-in.huawei.com) (119.145.14.65) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Oct 2012 04:09:36 +0000 Received: from 172.24.2.119 (EHLO szxeml211-edg.china.huawei.com) ([172.24.2.119]) by szxrg02-dlp.huawei.com (MOS 4.3.4-GA FastPath queued) with ESMTP id ARE74868; Fri, 26 Oct 2012 12:09:13 +0800 (CST) Received: from SZXEML422-HUB.china.huawei.com (10.82.67.161) by szxeml211-edg.china.huawei.com (172.24.2.182) with Microsoft SMTP Server (TLS) id 14.1.323.3; Fri, 26 Oct 2012 12:07:58 +0800 Received: from SZXEML531-MBX.china.huawei.com ([fe80::61a8:2cb5:62f9:d4a4]) by szxeml422-hub.china.huawei.com ([10.82.67.161]) with mapi id 14.01.0323.003; Fri, 26 Oct 2012 12:07:53 +0800 From: Anoop Sam John To: "user@hbase.apache.org" Subject: RE: Hbase import Tsv performance (slow import) Thread-Topic: Hbase import Tsv performance (slow import) Thread-Index: AQHNsTbXPS1DjM5jh0SVGrQm3p6I3pfJqmCAgABNGICAAABvAIABBCfN Date: Fri, 26 Oct 2012 04:07:53 +0000 Message-ID: <0CE69E9126D0344088798A3B7F7F80863A4F1D0E@szxeml531-mbx.china.huawei.com> References: , In-Reply-To: Accept-Language: en-US, zh-CN Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.18.96.95] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-CFilter-Loop: Reflected X-Virus-Checked: Checked by ClamAV on apache.org >As per Anoop and Ram, WAL is not used with bulk loading so turning off WAL wont have any impact on performance. This is if HFileOutputFormat is being used.. There is a TableOutputFormat = which also can be used as the OutputFormat for MR.. Here write to wal is ap= plicable This one, instead of write to HFile and upload at one shot, puts data into = HTable calling put() method... -Anoop- ________________________________________ From: anil gupta [anilgupta84@gmail.com] Sent: Friday, October 26, 2012 2:05 AM To: user@hbase.apache.org Subject: Re: Hbase import Tsv performance (slow import) @Jonathan, As per Anoop and Ram, WAL is not used with bulk loading so turning off WAL wont have any impact on performance. On Thu, Oct 25, 2012 at 1:33 PM, anil gupta wrote: > Hi Nicolas, > > As per my experience you wont get good performance if you run 3 Map task > simultaneously on one Hard Drive. That seems like a lot of I/O on one dis= k. > > HBase performs well when you have at least 5 nodes in cluster. So, runnin= g > HBase on 3 nodes is not something you would do in prod. > > Thanks, > Anil > > On Thu, Oct 25, 2012 at 8:57 AM, Jonathan Bishop w= rote: > >> Nicolas, >> >> I just went through the same exercise. There are many ways to get this t= o >> go faster, but eventually I decided that bulk loading is the best soluti= on >> as run times scaled with the number machines in my cluster when I used >> that >> approach. >> >> One thing you can try is to turn off hbase's write ahead log (WAL). But = be >> aware that regionserver failure will cause data loss if you do this. >> >> Jon >> >> On Tue, Oct 23, 2012 at 8:48 AM, Nick maillard < >> nicolas.maillard@fifty-five.com> wrote: >> >> > Hi everyone >> > >> > I'm starting with hbase and testing for our needs. I have set up a >> hadoop >> > cluster of Three machines and A Hbase cluster atop on the same three >> > machines, >> > one master two slaves. >> > >> > I am testing the Import of a 5GB csv file with the importTsv tool. I >> > import the >> > file in the HDFS and use the importTsv tool to import in Hbase. >> > >> > Right now it takes a little over an hour to complete. It creates aroun= d >> 2 >> > million entries in one table with a single family. >> > If I use bulk uploading it goes down to 20 minutes. >> > >> > My hadoop has 21 map tasks but they all seem to be taking a very long >> time >> > to >> > finish many tasks end up in time out. >> > >> > I am wondering what I have missed in my configuration. I have followed >> the >> > different prerequisites in the documentations but I am really unsure a= s >> to >> > what >> > is causing this slow down. If I were to apply the wordcount example to >> the >> > same >> > file it takes only minutes to complete so I am guessing the issue lies >> in >> > my >> > Hbase configuration. >> > >> > Any help or pointers would by appreciated >> > >> > >> > > > > -- > Thanks & Regards, > Anil Gupta > -- Thanks & Regards, Anil Gupta=