Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 84084 invoked from network); 12 Feb 2010 12:26:02 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 12 Feb 2010 12:26:02 -0000 Received: (qmail 53997 invoked by uid 500); 12 Feb 2010 12:26:02 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 53936 invoked by uid 500); 12 Feb 2010 12:26:01 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 53926 invoked by uid 99); 12 Feb 2010 12:26:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Feb 2010 12:26:01 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of lists@nabble.com designates 216.139.236.158 as permitted sender) Received: from [216.139.236.158] (HELO kuber.nabble.com) (216.139.236.158) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Feb 2010 12:25:51 +0000 Received: from isper.nabble.com ([192.168.236.156]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1NfuaQ-00066n-9a for hbase-user@hadoop.apache.org; Fri, 12 Feb 2010 04:25:30 -0800 Message-ID: <27562803.post@talk.nabble.com> Date: Fri, 12 Feb 2010 04:25:30 -0800 (PST) From: Gaurav Vashishth To: hbase-user@hadoop.apache.org Subject: Re: HBase Insert Performance In-Reply-To: <78568af11001180307q27e29b52m8e27c14da51eff@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Nabble-From: vashgaurav@gmail.com References: <27208387.post@talk.nabble.com> <78568af11001180219h63cbe80bsba3582fa0e709ff0@mail.gmail.com> <27208828.post@talk.nabble.com> <78568af11001180307q27e29b52m8e27c14da51eff@mail.gmail.com> Ryan,=20 I have setup the custer as suggested by you. Now I have Master,namemode and zookeeper on same machine and have 8 region servers running as data nodes and with this configuration I was able to get the insertion speed of around 18K records/sec. Though Im still using 4GB ram, will upgrade it also and I hope adding more region servers will increase the insertion speed=20 Thanks, Gaurav Ryan Rawson wrote: >=20 > Hey, >=20 > So there are 2 major problems here: > - the setup is way off. There is no actual data duplication for > example, you will put every write to 1 machine, which when it fails, > so goes your data. > - These machines don't have enough ram. They must have at least > 1gb/core, ideally 2gb/core or more. This means they should have 8 gb > ram. crucial.com >=20 > A better setup would be: > - 1 "master" node, runs: hmaster, 1xzookeeper, namenode > - 5 data/regionservers >=20 > The key here to performance is to spread your workload over more > machines. This is how clustered software works in a nutshell. using > only 1/3 of your machines for "regionservers" and 1/6th for data > storage (datanode) is non-ideal. >=20 > You really need to up the ram. I run: > - dual quad i7s with hyper-threading, which gives 16 cores to the OS > - 24 gb ram > - 4 x 1tb disk >=20 > My small end machines are: > - dual quad xeons, 8 cores to the OS > - 16 gb ram > - 2 x 1tb disk >=20 > For performance you really dont want to have less than 1-2gb ram per > core. Without a lot of ram, you don't get effective disk caching. You > can't run map-reduces on the same nodes, you may run into swap issues, > etc. 4 gb ddr3 ram is about $150 usd. >=20 > But given a reasonable machine set, doing 50k inserts/sec sustained > over long periods of time is totally doable. You will need more than 6 > machines though! Don't forget your spares, since you really want to be > able to operate on N-{1,2} machines so failures don't cripple you. >=20 >=20 >=20 > On Mon, Jan 18, 2010 at 2:55 AM, Gaurav Vashishth > wrote: >> >> Using 6 machines, 8 core with 4 GB Ram, right now for setting up the >> scenario. >> >> 2 region servers >> 1 ZooKeeper >> 1 Data Node >> 2 Name Node >> >> >> >> Ryan Rawson wrote: >>> >>> How many machines do you have? I'd try at least 20+ late model boxes. >>> >>> On Jan 18, 2010 2:14 AM, "Gaurav Vashishth" >>> wrote: >>> >>> >>> I need to store live data which is about 40-50K records /sec, evaluated >>> MYSql >>> and now trying =C2=A0HBase. >>> >>> Just read in docstoc that HBase insert performance, for few 1000 rows >>> and >>> 10 >>> columns with 1 MB values, is 68ms/row. My scenario is similar, we need >>> under >>> 10k rows, 10-20 columns and which can have thousands of version with >>> values >>> not greater than 300 bytes. Initially, I thought HBase can solve the >>> puprose >>> but reading docstoc article have put doubt in my mind. >>> >>> Can we get 40-50k records/sec insertion speed in HBase?? Also, there >>> would >>> be thousand of users who will be reading teh database also, can HBase >>> maintain that much of speed? >>> >>> Thanks >>> Gaurav >>> -- >>> View this message in context: >>> http://old.nabble.com/HBase-Insert-Performance-tp27208387p27208387.html >>> Sent from the HBase User mailing list archive at Nabble.com. >>> >>> >> >> -- >> View this message in context: >> http://old.nabble.com/HBase-Insert-Performance-tp27208387p27208828.html >> Sent from the HBase User mailing list archive at Nabble.com. >> >> >=20 >=20 --=20 View this message in context: http://old.nabble.com/HBase-Insert-Performanc= e-tp27208387p27562803.html Sent from the HBase User mailing list archive at Nabble.com.