hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject RE: RowCounter example run time
Date Sun, 23 May 2010 14:36:46 GMT


Here's the problem.. you go to any relational database and do a select count(*) and you get
a response back fairly quickly.
The difference is that in HBase, you're doing a physical count and with the relational engine
you're pulling it from meta data.

I have a couple of ideas on how we could do this...


> Date: Sat, 22 May 2010 09:25:51 -0700
> Subject: Re: RowCounter example run time
> From: jdcryans@apache.org
> To: user@hbase.apache.org
> My first question would be, what do you expect exactly? Would 5 min be
> enough? Or are you expecting something more like 1-2 secs (which is
> impossible since this is mapreduce)?
> Then there's also Jon's questions.
> Finally, did you set a higher scanner caching on that job?
> hbase.client.scanner.caching is the name of the config, which defaults
> to 1. When mapping a HBase table, if you don't set it higher you're
> basically benchmarking the RPC layer since it does 1 call per next()
> invocation. Setting the right value depends on the size of your rows
> eg are you storing 60 bytes or something high like 100KB? On our 13B
> rows table (each row is a few bytes), we set it to 10k.
> J-D
> On Sat, May 22, 2010 at 8:40 AM, Andrew Nguyen
> <andrew-lists-hbase@ucsfcti.org> wrote:
> > Hello,
> >
> > I finally got some decent hardware to put together a 1 master, 4 slave Hadoop/HBase
cluster.  However, I'm still waiting for space in the datacenter to clear out and only have
3 of the nodes deployed (master + 2 slaves).  Each node is a quad-core AMD with 8G of RAM,
running on a GigE network.  HDFS is configured to run on a separate (from the OS drive) U320
drive.  The master has RAID1 mirrored drives only.
> >
> > I've installed HBase with slave1 and slave2 as regionservers and master, slave1,
slave2 as the ZK quorom.  The master serves as the NN and JT and the slaves as DN and TT.
> >
> > Now my question:
> >
> > I've imported 22.5M rows into HBase, into a single table.  Each row has 8 or so
columns.  I just ran the RowCounter MR example and it takes about 25 minutes to complete.
 Is a 3 node setup too underpowered to combat the overhead of Hadoop and HBase?  Or, could
it be something with my configuration?  I've been playing around with Hadoop some but this
is my first attempt at anything HBase.
> >
> > Thanks!
> >
> > --Andrew
The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message