hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Slatman" <i...@slatman.nl>
Subject RE: hbase table with ~10k columns
Date Fri, 11 Feb 2011 14:11:55 GMT
Hi St.Ack,

Thanks for your quick response. We are running a small cluster based on 4
identical machines (Intel, each with 1 Quad Core Xeon E5620, 24 GB DDR3 ECC,
6 1TB disks in Jbod). We also experimented with raid-0 but Jbod seemed to be
the better option (less trouble when a disk fails and better performance).

We are running the latest stable Hadoop version with 1 Master/Datanode and 3
supplemental Datanodes. Replication factor 2. (I have thought of increasing
this to get better performance, not sure if this would work). Performance on
row-selections is great when column-families are relatively small. When
Column-families get bigger performance drops a lot. (Even when we use the
addcolumn("Family", "Qualifier") to select a single column within a single
column family. We have experimented with the IN_MEMORY option and decreased
blocksize from 65536 to 8192 with no measurable result. We do not use
compression because our column values are small (numbers). I also read that
column qualifiers are not compressed, is this correct? If so in my opinion
compression would not have a big impact with the very small values we store.

I also checked out bloomfilters but they are not available in hbase 20.06

We could move to 0.90.0, it is the stable release since jan 8th right? We
basically using hadoop and hbase for storage and calculation snapshots. We
are not using it for running mapreduce jobs (yet).

Kind regards,

-----Oorspronkelijk bericht-----
Van: saint.ack@gmail.com [mailto:saint.ack@gmail.com] Namens Stack
Verzonden: donderdag 10 februari 2011 18:37
Aan: dev@hbase.apache.org
Onderwerp: Re: hbase table with ~10k columns

Tell us more about your cluster.  Tell us more about your configs.
Can you move to 0.90.0 hbase?  Its been out for a little while now and
has perf improvements over 0.20.6.


On Thu, Feb 10, 2011 at 4:15 AM, Alex Slatman <info@slatman.nl> wrote:
> Hello,
> I don't know if this is the correct mailinglist to ask a question like
> If not please be so kind to redirect me to the correct malinglist.
> At this moment we have a small cluster running hadoop and hbase. We are
> experimenting with different sized tables and performance options. (Using
> hbase 20.06). In our testing environment we have a table containing ~20
> million rows with having 2 column families. Each column family has (at
> 10.000 columns. To my knowledge data is stored on a per row per
> basis. We see performance dropping a lot when the number of columns in a
> columnfamily increases. Is there a way to improve performance or am I
> missing something here?
> I already tried setting the columnfamily IN_MEMORY and decreasing
> Unfortunately with no result. I hope someone could point me in the right
> direction,
> Kind regards,
> Alex

View raw message