Return-Path: Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: (qmail 76521 invoked from network); 10 Feb 2011 17:22:59 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 Feb 2011 17:22:59 -0000 Received: (qmail 2934 invoked by uid 500); 10 Feb 2011 17:22:59 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 2701 invoked by uid 500); 10 Feb 2011 17:22:56 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Delivered-To: moderator for dev@hbase.apache.org Received: (qmail 28833 invoked by uid 99); 10 Feb 2011 12:16:01 -0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=CTYPE_001C_B,HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) From: "Alex Slatman" To: Subject: hbase table with ~10k columns Date: Thu, 10 Feb 2011 13:15:28 +0100 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0000_01CBC924.96C09440" X-Mailer: Microsoft Office Outlook, Build 11.0.5510 Thread-Index: AcvJHBEmBu2+POXTSKSfGxx+mB84qw== X-MimeOLE: Produced By Microsoft MimeOLE V6.0.6001.18049 Message-Id: <20110210121531.2EF76F812@mars.digitpaint.nl> X-Virus-Checked: Checked by ClamAV on apache.org ------=_NextPart_000_0000_01CBC924.96C09440 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hello, I don't know if this is the correct mailinglist to ask a question like mine. If not please be so kind to redirect me to the correct malinglist. At this moment we have a small cluster running hadoop and hbase. We are experimenting with different sized tables and performance options. (Using hbase 20.06). In our testing environment we have a table containing ~20 million rows with having 2 column families. Each column family has (at most) 10.000 columns. To my knowledge data is stored on a per row per columnfamily basis. We see performance dropping a lot when the number of columns in a columnfamily increases. Is there a way to improve performance or am I missing something here? I already tried setting the columnfamily IN_MEMORY and decreasing blocksize. Unfortunately with no result. I hope someone could point me in the right direction, Kind regards, Alex ------=_NextPart_000_0000_01CBC924.96C09440--