Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 5731 invoked from network); 14 Mar 2010 18:11:02 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 14 Mar 2010 18:11:02 -0000 Received: (qmail 33426 invoked by uid 500); 14 Mar 2010 18:10:18 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 33389 invoked by uid 500); 14 Mar 2010 18:10:18 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 33380 invoked by uid 99); 14 Mar 2010 18:10:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 14 Mar 2010 18:10:18 +0000 X-ASF-Spam-Status: No, hits=4.4 required=10.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of timrobertson100@gmail.com designates 209.85.160.48 as permitted sender) Received: from [209.85.160.48] (HELO mail-pw0-f48.google.com) (209.85.160.48) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 14 Mar 2010 18:10:10 +0000 Received: by pwi3 with SMTP id 3so681755pwi.35 for ; Sun, 14 Mar 2010 11:09:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=0CgdmcBcyUs/rqcc08JlIo2uVv1v+mLWL7DG04+dGg8=; b=mn5NaNPnceddzo2HWIqkRqpNlv2ug0UfoqFcPkZHT02iFWeBY0P9E8NaUAEWApyaYc PwS6jPaM03i+3B7CSAZ7nDeBU6jEK6vZ/O1KWGRlI+gjYpqTCgY+d4N5VirUFDVAEkNj Crv89ssjmn486bgnj+BHm2Ix9xS++FZbYhZ4I= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=SZy9f5djqHHrYEHSHa8SUQk5R4c3YCQG6Wj0e5ESaG0qUZMofLNDH6kUpJsuF/YTaC V3JeF6CmtEG2UE4v1fx0XlsYJTf2gZ8SEqCwZ+mb67xYVdpRGLJu6VPv1MneiDPckAZI DHGKHVuJ+ETPT6wwyR/fjQ0fpTzq90IZlCnHA= MIME-Version: 1.0 Received: by 10.141.106.11 with SMTP id i11mr5039858rvm.213.1268590188898; Sun, 14 Mar 2010 11:09:48 -0700 (PDT) In-Reply-To: References: <4B9A9A81.9040809@gmail.com> <4B9AA0D7.5020304@gmail.com> <32120a6a1003130001g1173ddfeub6bd2b5c14c20395@mail.gmail.com> Date: Sun, 14 Mar 2010 19:09:48 +0100 Message-ID: <32120a6a1003141109l6e4d0df9wf071d3bbacc44573@mail.gmail.com> Subject: Re: UUID as key wuz: RE: worth choosing the shortest possible column names/keys? From: Tim Robertson To: hbase-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=000e0cd13b7a9329d40481c6ab7b X-Virus-Checked: Checked by ClamAV on apache.org --000e0cd13b7a9329d40481c6ab7b Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Well I could well be wrong, but my understanding is that there are memory mapped index files using the key, so key choice would come in to play for memory requirements here. For secondary indexes, it has to be a factor for memory requirements- halving the size of the data you need to get in memory must be a good thing. I am also building Lucene indexes storing only this key, so it influences their size a fair amount too. I know for sure Mysql (Myisam) btree index size is greatly affected by the size of the Numeric types. They are more complicated that my understanding of HBase indexing, but the same principles apply (if it ain't in memory the= n you're into disk seeking). On Sun, Mar 14, 2010 at 6:41 PM, Michael Segel w= rote: > > > UUID overkill? > Uhm uuid is a 128bit key. That's what 16 bytes in length? Definitely not > 'overkill' if all you want the key to do is to guarantee uniqueness. > > Very easy to generate and extremely easy to use. You can even hash it and > create version 5 UUIDs. > > I don't understand why you'd want to try and generate an 8 byte (you said= 8 > character, assuming you meant latin-1 characterset), when you have a > standard way of doing it already. 8 byte vs 16 byte? C'mon....really? > > JMHO > > -Mike > > > Date: Sat, 13 Mar 2010 09:01:38 +0100 > > Subject: Re: worth choosing the shortest possible column names/keys? > > From: timrobertson100@gmail.com > > To: hbase-user@hadoop.apache.org > > > > Along similar lines... (sorry for hijacking thread) > > > > I assume that this is even more applicable for key choice given the way > keys > > participate in indexes? I have been using UUID, but it is way overkill > for > > my needs. What are others using? Is there convenient way of doing > (e.g.) 8 > > characters strings? > > > > > _________________________________________________________________ > Hotmail: Trusted email with Microsoft=92s powerful SPAM protection. > http://clk.atdmt.com/GBL/go/210850552/direct/01/ > --000e0cd13b7a9329d40481c6ab7b--