hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Minh Duc Nguyen <mdngu...@gmail.com>
Subject Re: Slow random reads, SocketTimeoutExceptions
Date Sun, 22 Jul 2012 19:56:36 GMT
Adrien,

   Your poor random read get performance may be related to your compaction
policy.  HBase strives to take advantage of data locality by running region
servers on data nodes that physically store the region files.  However, it
is possible (for a variety of reasons) that a region server will be running
on a server that doesn't store its region files locally.  In that case,
you'll see poor read performance because of the extra network IO.
 Regular compaction creates new HDFS files that make up a region on the
node currently running the region server, improving data locality.

In general, some other ways to improve read performance are enabling
compression, increasing your query selectivity (specifying only the data
you need like the column family, column qualifier, timestamps), and
enabling bloom filters.

HTH,
Minh

On Wed, Jul 11, 2012 at 4:30 PM, Adrien Mogenet <adrien.mogenet@gmail.com>wrote:

> A cell-size is about 300 bytes in my case. (row-key's length is 32 bytes)
> In my current scenario, I generated 100 tables with a single column family.
> I'm inserting between 100k and 300k rows per second, depending on my
> settings - but it's not the purpose here, I'm mostly trying to get good
> concurrent (random) read/write performances. My benchmark is running on 30
> nodes.
>
> On Wed, Jul 11, 2012 at 10:22 PM, Asaf Mesika <asaf.mesika@gmail.com>
> wrote:
>
> > What's your cell value size?
> > What do you mean by 100 tables in one column family?
> > Can you please specify what was your insert rate and how many nodes you
> > have?
> >
> > Sent from my iPhone
> >
> > On 11 ביול 2012, at 22:08, Adrien Mogenet <adrien.mogenet@gmail.com>
> > wrote:
> >
> > Hi there,
> >
> > I'm discovering HBase and comparing it with other distributed database I
> > know much better. I am currently stressing my testing platform (servers
> > with 32 GB Ram, 16 GB allocated to HBase JVM) and I'm observing strange
> > performances... I'm putting tons of well-spred data (100 Tables of 100M
> > rows in a single column family) and then I'm performing random reads. I
> get
> > good read performances while the table does not have too much data in it,
> > but in a big table, I only get around 100/300 qps. I'm not swapping,
> don't
> > see any long pauses due to GC and insert rate is still very high, but
> > nothing come from reads and it often results in a SocketTimeoutException
> > (while waiting for channel to be ready for read exceptions, etc.).
> >
> > I noticed that certain StoreFile were very big (~120 GB) and I adjusted
> > compaction strategy to no compact such big files (I don't know if it can
> be
> > related to my issue).
> >
> > I noticed that when I'm stressing my cluster with Get requests,
> everything
> > *looks* fine until a RegionServer does not yield a data locally and fetch
> > it from HDFS, resulting in high and long network use, more than 60
> seconds,
> > that's throwing SocketTimeoutException).
> >
> > How does HBase handle data locality for random accesses ? Could it be a
> > lead to solve this kind of issue ?
> > My block cache of 5 GB is not full at all...
> >
> > --
> > Adrien Mogenet
> > http://www.mogenet.me
> >
>
>
>
> --
> Adrien Mogenet
> 06.59.16.64.22
> http://www.mogenet.me
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message