Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 68493 invoked from network); 14 Mar 2010 09:24:00 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 14 Mar 2010 09:24:00 -0000 Received: (qmail 50369 invoked by uid 500); 14 Mar 2010 09:23:17 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 50146 invoked by uid 500); 14 Mar 2010 09:23:17 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 50138 invoked by uid 99); 14 Mar 2010 09:23:16 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 14 Mar 2010 09:23:16 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of tuxracer69@gmail.com designates 209.85.218.211 as permitted sender) Received: from [209.85.218.211] (HELO mail-bw0-f211.google.com) (209.85.218.211) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 14 Mar 2010 09:23:07 +0000 Received: by bwz3 with SMTP id 3so8651bwz.29 for ; Sun, 14 Mar 2010 01:22:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from :user-agent:mime-version:to:subject:references:in-reply-to :content-type:content-transfer-encoding; bh=k1Nadxojrhgdhb5Ywz+8AYiLVXF0xWrGe+32q8p0aRg=; b=W0lerQAeLXYPsomovIA3XPfd5amfvm4Hm3DGgR53IWDwofbh2UoQbms8MB/hD5va/g KUGbOiBMlq6MvlZdM8sfPlb/c6tumjIJNWS81foqafrQlrurUH31+efKL39aBHvHisDS h4Qe3S038q1gZL33JV/TFZNFLVstXfecf04S4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; b=D/7HiU67sRPq4cBUx8cehUsa0avAyjRcZkc+51Y7zYHaH1NDxINsLXDUIBpShDxdze 2ONzfcZ2v5qL6XE3jVmvvjmUj/8CnSt5tHso2vTP8lCNd+k9ZLujbsKo4L0dbeIjbutG rYqogwJ8pdPQmpG0vQxh0B3ltvanWDUu0QWOk= Received: by 10.204.33.209 with SMTP id i17mr3239702bkd.187.1268558567041; Sun, 14 Mar 2010 01:22:47 -0800 (PST) Received: from [192.168.1.65] (78-86-128-147.zone2.bethere.co.uk [78.86.128.147]) by mx.google.com with ESMTPS id a11sm14934418bkc.9.2010.03.14.01.22.45 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sun, 14 Mar 2010 01:22:46 -0800 (PST) Message-ID: <4B9CAAE4.1050208@gmail.com> Date: Sun, 14 Mar 2010 09:22:44 +0000 From: TuX RaceR User-Agent: Mozilla-Thunderbird 2.0.0.22 (X11/20090701) MIME-Version: 1.0 To: hbase-user@hadoop.apache.org Subject: Re: worth choosing the shortest possible column names/keys? References: <4B9A9A81.9040809@gmail.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Thank you guys for your answers. I'll map descriptive names to short name too ;) Cheers TuX Lars Francke wrote: >> Will I save a lot of space (especially if I have many small columns)? >> > > I don't have any hard numbers for you but I tested it and I remember > that on a dataset of about 10-20 GB I could save about 200-500 MB > (this was with compression enabled) just by not using descriptive > sting qualifiers that weren't data by itself. A lot of small columns > for me too (mostly counters). I use a simple mapping of short byte > arrays to strings so that it is still very easy to use in the client. > > I asked that very same question a few months ago on IRC but I think > nobody answered so I'd be interested in what others do as well. > > Cheers, > Lars >