hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Esteban Gutierrez <este...@cloudera.com>
Subject Re: random reads
Date Thu, 14 Aug 2014 18:01:53 GMT
Hello Thomas,

What version of HBase are you using? sorting and grouping based on the
regions the rows is going to help for sure. I don't think you should focus
too much in the locality side of the problem unless your HDFS input set is
too large (100s or 1000s of MBs per task), otherwise it might be faster to
load in-memory the input dataset and do the batched calls. As discussed in
this mailing list recently there are too many factors that might be
involved in the performance: number of threads or tasks, size of the row,
RS resources, configurations, etc. so any additional info would be very
helpful.

cheers,
esteban.




--
Cloudera, Inc.



On Thu, Aug 14, 2014 at 10:32 AM, Thomas Kwan <thomas.kwan@manage.com>
wrote:

> Hi there
>
> I have a use-case where I need to do a read to check if a hbase entry
> is present, then I do a put to create the entry when it is not there.
>
> I have a script to get a list of rowkeys from hive and put them on a
> HDFS directory. Then I have a MR job that reads the rowkeys and do
> batch reads. I am getting around 1.5K requests per second.
>
> To attempt to make this faster, I am wondering if I can
>
> - sort and group the rowkeys based on regions
> - make the MR jobs run on regions that have the data locally
>
> Scan or TableInputFormat must have some codes to do something similar
> right?
>
> thanks
> thomas
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message