hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Nguyen <andrew-lists-hb...@ucsfcti.org>
Subject Re: RowCounter example run time
Date Sat, 22 May 2010 17:28:10 GMT
Answers interspersed below

On May 22, 2010, at 9:25 AM, Jean-Daniel Cryans wrote:

> My first question would be, what do you expect exactly? Would 5 min be
> enough? Or are you expecting something more like 1-2 secs (which is
> impossible since this is mapreduce)?

I don't have a set requirement.  Just trying to learn more about the system and 25 minutes
seemed excessive.  I really have nothing to compare against and have no expectations; but,
it takes about 900 seconds to run the count function in the shell.  My main goal is to figure
out what reasonable times are given similar setups or just to have a general idea of what's
acceptable so that I can make sure that everything is configured properly.

> Then there's also Jon's questions.

I'm not sure how many regions there are per table.  My guess is whatever the default is since
this isn't an option I've tried to change.  However, I will look into it more and update the

> Finally, did you set a higher scanner caching on that job?
> hbase.client.scanner.caching is the name of the config, which defaults
> to 1. When mapping a HBase table, if you don't set it higher you're
> basically benchmarking the RPC layer since it does 1 call per next()
> invocation. Setting the right value depends on the size of your rows
> eg are you storing 60 bytes or something high like 100KB? On our 13B
> rows table (each row is a few bytes), we set it to 10k.

Again, my guess is that hbase.client.scanner.caching is 1 as you have mentioned.  When calculating
the size of a row, is this just the size of the data stored in the various columns or do I
need to factor in overhead also?  Do you have a reference or any guidance on the optimal setting
for the hbase.client.scanner.caching given the size of a typical row?  In my case, I have
about 8 rows, each storing a decimal value.  I haven't checked, but I'm assuming these are
being stored as doubles.

View raw message