hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques <whs...@gmail.com>
Subject Re: Slow full-table scans
Date Sun, 12 Aug 2012 21:05:13 GMT
Something to consider is that HBase stores and retrieves the row key (8
bytes in your case) + timestamp (8 bytes) + column qualifier (?) for every
single value.  The schemaless nature of HBase generally means that this
data has to be stored for each row (certain kinds of newer block level
compression can make this less).  So depending on your column qualifiers,
you're going to be looking at potentially a huge amount of overhead when
you're dealing with 200,000 cells in a single row.  I also wonder whether
you're dealing with a large amount of overhead simply on the
serialization/deserialization/instantiation side if you're pulling back
that many values.

I'm not sure how many people are using that many cells in a single row and
trying to read or write them all at once.

Other's may have more thoughts.

Jacques



On Sun, Aug 12, 2012 at 7:23 AM, Gurjeet Singh <gurjeet@gmail.com> wrote:

> Hi Ted,
>
> Yes, I am using the cloudera distribution 3.
>
> Gurjeet
>
> Sent from my iPad
>
> On Aug 12, 2012, at 7:11 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > Gurjeet:
> > Can you tell us which HBase version you are using ?
> >
> > Thanks
> >
> > On Sun, Aug 12, 2012 at 5:32 AM, Gurjeet Singh <gurjeet@gmail.com>
> wrote:
> >
> >> Thanks for the reply Stack. My comments are inline.
> >>
> >>> You've checked out the perf section of the refguide?
> >>>
> >>> http://hbase.apache.org/book.html#performance
> >>
> >> Yes. HBase has 8GB RAM both on my cluster as well as my dev machine.
> >> Both configurations are backed by SSDs and Hbase options are set to
> >>
> >> HBASE_OPTS="-ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode"
> >>
> >> The data that I am dealing with is static. The table never changes
> >> after the first load.
> >>
> >> Even some of my GET requests are taking up to a full 60 seconds when
> >> the row sizes reach ~10MB. In general, taking 5 seconds to fetch a
> >> single row (~1MB) seems a extremely high to me.
> >>
> >> Thanks again for your help.
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message