Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (nike.apache.org: domain of asaf.mesika@gmail.com
 designates 209.85.219.54 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAMpwW3quM1k7pnop3ezqesUX=HNyHoC3nE+u5Kwe9aDnQmT7dQ@mail.gmail.com>
References: 
 <CAMpwW3peAin8yqwB=exP9fgq=k7ygbLpPmdMXmRYbdfYa4RuzA@mail.gmail.com>
	<CAAT7Mkr453pyyu-xnih=BEhzzfbOQhsWonS0-NzqNbrGFDsPpg@mail.gmail.com>
	<CAMpwW3o6t2aSDcq8ZFP6cjcd46oNKz_WNK_yGaRh_esTfpvZ-w@mail.gmail.com>
	<1277447C-C19D-444B-A861-6651106D54B1@gmail.com>
	<CAMpwW3r8JuMgNkT4fsW5+SNu2xmdXNbqCNM2i87UM-M84qriFA@mail.gmail.com>
	<CALte62xcNjaY16wfZ3R=E5nyj3m9K5tg7Z6Dd860KMMOmrJn6g@mail.gmail.com>
	<CAMpwW3qSXewpFZOi6FZ6Lmn9cw=Fk7UnRR9JTCGOxB6uxP6spA@mail.gmail.com>
	<CAMpwW3r5DgW=9fG7_NirmA7mW=je1mqNPhK=KRNhW1-BM=eVCA@mail.gmail.com>
	<CA+r7Yv=uYrrTYk9p5+e3KK24eGdSLuHEwH8Lh=nuRK5v1WcqYg@mail.gmail.com>
	<CAMpwW3quM1k7pnop3ezqesUX=HNyHoC3nE+u5Kwe9aDnQmT7dQ@mail.gmail.com>
Date: Thu, 4 Apr 2013 07:21:41 +0300
Message-ID: 
 <CA+r7YvnBug=99-Yr5Aj8nS7e79v5VaX42bTEtqJ7OifaSRg1Qw@mail.gmail.com>
Subject: Re: Read thruput
From: Asaf Mesika <asaf.mesika@gmail.com>
To: "user@hbase.apache.org" <user@hbase.apache.org>
Content-Type: multipart/alternative; boundary=e89a8fb1fe2ab929c804d9814ef1

--e89a8fb1fe2ab929c804d9814ef1
Content-Type: text/plain; charset=UTF-8

Can you possible batch some Get calls to a Scan with a Filter that contains
the list of row keys you need?
For example, if you have 100 Gets, you can create a start key and end key
from getting the max and mix from those 100 row keys list. Next, you need
to write a filter which saves this 100 row keys to a private member and
uses the hint method in the Filter interface to jump to the closest rowkey
in the region it scans.

If you need help with that I can add a more detailed description of that
Filter.

This should reduce most of the heavy weight over head processing of each
Get.

On Tuesday, April 2, 2013, Vibhav Mundra wrote:

> How does your client call looks like? Get? Scan? Filters?
> --My client keeps doing the Get request. Each time a single row is fetched.
> Essentially we are using Hbase as key value retrieval.
>
> Is 3000/sec is client side calls or is it in numbers of rows per sec?
> --3000/sec is the client side calls.
>
> If you measure in MB/sec how much read throughput do you get?
> --Each client request's response is at maximum 1 KB so its the MB/sec is
> 3MB { 3000 * 1 KB }.
>
> Where is your client located? Same router as the cluster?
> --It is routed on the same cluster, on the same subnet.
>
> Have you activated dfs read short circuit? Of not try it.
> --I have not tried this. Let me try this also.
>
> Compression - try switching to Snappy - should be faster.
> What else is running on the cluster parallel to your reading client?
> --There is small upload code running. I have never seen the CPU usage more
> than 5%, so actually didnt bother to look at this angle.
>
> -Vibhav
>
>
> On Tue, Apr 2, 2013 at 1:42 AM, Asaf Mesika <asaf.mesika@gmail.com> wrote:
>
> > How does your client call looks like? Get? Scan? Filters?
> > Is 3000/sec is client side calls or is it in numbers of rows per sec?
> > If you measure in MB/sec how much read throughput do you get?
> > Where is your client located? Same router as the cluster?
> > Have you activated dfs read short circuit? Of not try it.
> > Compression - try switching to Snappy - should be faster.
> > What else is running on the cluster parallel to your reading client?
> >
> > On Monday, April 1, 2013, Vibhav Mundra wrote:
> >
> > > What is the general read-thru put that one gets when using Hbase.
> > >
> > >  I am not to able to achieve more than 3000/secs with a timeout of 50
> > > millisecs.
> > > In this case also there is 10% of them are timing-out.
> > >
> > > -Vibhav
> > >
> > >
> > > On Mon, Apr 1, 2013 at 11:20 PM, Vibhav Mundra <mundra@gmail.com>
> wrote:
> > >
> > > > yes, I have changes the BLOCK CACHE % to 0.35.
> > > >
> > > > -Vibhav
> > > >
> > > >
> > > > On Mon, Apr 1, 2013 at 10:20 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> > > >
> > > >> I was aware of that discussion which was about MAX_FILESIZE and
> > > BLOCKSIZE
> > > >>
> > > >> My suggestion was about block cache percentage.
> > > >>
> > > >> Cheers
> > > >>
> > > >>
> > > >> On Mon, Apr 1, 2013 at 4:57 AM, Vibhav Mundra <mundra@gmail.com>
> > wrote:
> > > >>
> > > >> > I have used the following site:
> > > >> > http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow
> > > >> >
> > > >> > to lessen the value of block cache.
> > > >> >
> > > >> > -Vibhav
> > > >> >
> > > >> >
> > > >> > On Mon, Apr 1, 2013 at 4:23 PM, Ted <yuzhihong@gmail.com> wrote:
> > > >> >
> > > >> > > Can you increase block cache size ?
> > > >> > >
> > > >> > > What version of hbase are you using ?
> > > >> > >
> > > >> > > Thanks
> > > >> > >
> > > >> > > On Apr 1, 2013, at 3:47 AM, Vibhav Mundra <mundra@gmail.com>
> > wrote:
> > > >> > >
> > > >> > > > The typical size of each of my row is less than 1KB.
> > > >> > > >
> > > >> > > > Regarding the memory, I have used 8GB for Hbase regionservers
> > and
> > > 4
> > > >> GB
> > > >> > > for
> > > >> > > > datanodes and I dont see them completely used. So I ruled out
> > the
> > > GC
> > > >> > > aspect.
> > > >> > > >
> > > >> > > > In case u still believe that GC is an issue, I will upload the
> > gc
> > > >> logs.
> > > >> > > >
> > > >> > > > -Vibhav
> > > >> > > >
> > > >> > > >
> > > >> > > > On Mon, Apr 1, 2013 at 3:46 PM, ramkrishna vasudevan <
> > > >> > > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > > >> > > >
> > > >> > > >> Hi
> > > >> > > >>
> > > >> > > >> How big is your row?  Are they wider rows and what would be
> the
> > > >> size
> > > >> > of
> > > >> > > >> every cell?
> > > >> > > >> How many read threads are getting used?
> > > >> > > >>
> > > >> > > >>
> > > >

--e89a8fb1fe2ab929c804d9814ef1--