hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gurjeet Singh <gurj...@gmail.com>
Subject Re: Slow full-table scans
Date Sun, 12 Aug 2012 22:46:44 GMT
Hi Jacques,

I did consider that. So, this increases the on-disk size of my data by
3-4x (=600-800MB). That still does not explain why reading 1row (=~4MB
with overhead) takes 5sec. About serialization/deserialization on the
client side - it happens on a different thread out of a buffer and
most of the time, that thread is just idling.

Gurjeet

On Sun, Aug 12, 2012 at 2:05 PM, Jacques <whshub@gmail.com> wrote:
> Something to consider is that HBase stores and retrieves the row key (8
> bytes in your case) + timestamp (8 bytes) + column qualifier (?) for every
> single value.  The schemaless nature of HBase generally means that this
> data has to be stored for each row (certain kinds of newer block level
> compression can make this less).  So depending on your column qualifiers,
> you're going to be looking at potentially a huge amount of overhead when
> you're dealing with 200,000 cells in a single row.  I also wonder whether
> you're dealing with a large amount of overhead simply on the
> serialization/deserialization/instantiation side if you're pulling back
> that many values.
>
> I'm not sure how many people are using that many cells in a single row and
> trying to read or write them all at once.
>
> Other's may have more thoughts.
>
> Jacques
>
>
>
> On Sun, Aug 12, 2012 at 7:23 AM, Gurjeet Singh <gurjeet@gmail.com> wrote:
>
>> Hi Ted,
>>
>> Yes, I am using the cloudera distribution 3.
>>
>> Gurjeet
>>
>> Sent from my iPad
>>
>> On Aug 12, 2012, at 7:11 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>>
>> > Gurjeet:
>> > Can you tell us which HBase version you are using ?
>> >
>> > Thanks
>> >
>> > On Sun, Aug 12, 2012 at 5:32 AM, Gurjeet Singh <gurjeet@gmail.com>
>> wrote:
>> >
>> >> Thanks for the reply Stack. My comments are inline.
>> >>
>> >>> You've checked out the perf section of the refguide?
>> >>>
>> >>> http://hbase.apache.org/book.html#performance
>> >>
>> >> Yes. HBase has 8GB RAM both on my cluster as well as my dev machine.
>> >> Both configurations are backed by SSDs and Hbase options are set to
>> >>
>> >> HBASE_OPTS="-ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode"
>> >>
>> >> The data that I am dealing with is static. The table never changes
>> >> after the first load.
>> >>
>> >> Even some of my GET requests are taking up to a full 60 seconds when
>> >> the row sizes reach ~10MB. In general, taking 5 seconds to fetch a
>> >> single row (~1MB) seems a extremely high to me.
>> >>
>> >> Thanks again for your help.
>> >>
>>

Mime
View raw message