hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Batch Get performance degrades from within Mapreduce
Date Mon, 20 Feb 2012 21:37:43 GMT
On Mon, Feb 20, 2012 at 1:12 PM, Himanish Kushary <himanish@gmail.com> wrote:
> Where is the time being spent? In the server, in the mapper?  - The most
> time is spent in calling Htable.batch(...) inside the mapper

So, 100 Gets at a time?

> Why are you having scanner timeouts if you are doing big batch Gets? - We
> are getting scanner timeout from the original Scan which serves the input
> records to the mapper.The scanner caching is set to 100 .
>                             I think because the mapper is taking too
> long(because of the batch Gets inside it) to process initial 100 records ,
> the next batch of scanned records throws the exception

How big are these 100 rows?

How many regions on this single RegionServer?

> Also, could it be happening due to concurrency ? I am currently on a single
> region-server. When i run the test case the batch Gets happen sequentially
> whereas from the map-reduce the batch Gets happen concurrently on the same
> region server. Could this be the reason that during map-reduce the
> performance degrades due to thrashing on the same region server ? Thoughts ?

A single regionserver?  How many datanodes?

Whats it look like on the machine running the regionserver?  Is it
working hard?  Seems odd that 100 gets would take longer than a a
minute to complete.

You've checked out the performance section of the reference guide?


View raw message