hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: random reads
Date Thu, 14 Aug 2014 19:21:34 GMT
Thomas:
Have you set tcpnodelay to true ?

See http://hbase.apache.org/book.html for explanation of
hbase.ipc.client.tcpnodelay

Cheers


On Thu, Aug 14, 2014 at 11:41 AM, Thomas Kwan <thomas.kwan@manage.com>
wrote:

> Hi Esteban,
>
> Thanks for sharing ideas.
>
> We are on Hbase 0.96 and java 1.6. I have enabled short-circuit read,
> and heap size is around 16G for each region server. We have about 20
> of them.
>
> The list of rowkeys that I need to process is about 10M. I am using
> batch gets already and the batch size is ~2000 gets.
>
> thomas
>
> On Thu, Aug 14, 2014 at 11:01 AM, Esteban Gutierrez
> <esteban@cloudera.com> wrote:
> > Hello Thomas,
> >
> > What version of HBase are you using? sorting and grouping based on the
> > regions the rows is going to help for sure. I don't think you should
> focus
> > too much in the locality side of the problem unless your HDFS input set
> is
> > too large (100s or 1000s of MBs per task), otherwise it might be faster
> to
> > load in-memory the input dataset and do the batched calls. As discussed
> in
> > this mailing list recently there are too many factors that might be
> > involved in the performance: number of threads or tasks, size of the row,
> > RS resources, configurations, etc. so any additional info would be very
> > helpful.
> >
> > cheers,
> > esteban.
> >
> >
> >
> >
> > --
> > Cloudera, Inc.
> >
> >
> >
> > On Thu, Aug 14, 2014 at 10:32 AM, Thomas Kwan <thomas.kwan@manage.com>
> > wrote:
> >
> >> Hi there
> >>
> >> I have a use-case where I need to do a read to check if a hbase entry
> >> is present, then I do a put to create the entry when it is not there.
> >>
> >> I have a script to get a list of rowkeys from hive and put them on a
> >> HDFS directory. Then I have a MR job that reads the rowkeys and do
> >> batch reads. I am getting around 1.5K requests per second.
> >>
> >> To attempt to make this faster, I am wondering if I can
> >>
> >> - sort and group the rowkeys based on regions
> >> - make the MR jobs run on regions that have the data locally
> >>
> >> Scan or TableInputFormat must have some codes to do something similar
> >> right?
> >>
> >> thanks
> >> thomas
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message