hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: UUID as key wuz: RE: worth choosing the shortest possible column names/keys?
Date Mon, 15 Mar 2010 08:12:29 GMT
You can use incrementColumnValue to generate sequential numbers.  The
call is atomic and fast.  It supports thousands of calls/second in my
testing.

-ryan

On Mon, Mar 15, 2010 at 12:15 AM, Tim Robertson
<timrobertson100@gmail.com> wrote:
>>
>> Maybe I'm missing something but the UUID is an artificial key, its used to
>> guarantee uniqueness and in this case you're using it as part of a key,value
>> pair.
>>
>
> Sure, understood.  UUID aims to be globally unique, whereas I am only
> looking for in cluster uniqueness across a couple billion items, but an
> algorithm that allows ID minting by machines in parallel.
>
>
>> So why are you storing it in a Lucene index as the value?
>>
>
> Because I have various search indexes to the row using combinations of
> fields from the row.  I want the whole row accessible in the search results,
> so I store the row key only (the row content is way to big for Lucene).
>  Lucene handles the search providing the Keys, and then the rows are pulled
> and transformed while streaming out in the results.
>
>
>> Look, the benefits of using the UUID definitely outweigh wrapping your own
>> solution in 8bytes, even in memory caches.
>> (Are you only storing values that are 16 bytes in length, or something much
>> larger?)
>
>
> The values are much much larger (100s - 1000s bytes) but they aren't going
> in to any in-memory structures.
>
>
>
>> > Date: Sun, 14 Mar 2010 19:09:48 +0100
>> > Subject: Re: UUID as key wuz: RE: worth choosing the shortest possible
>> column         names/keys?
>> > From: timrobertson100@gmail.com
>> > To: hbase-user@hadoop.apache.org
>> >
>> > Well I could well be wrong, but my understanding is that there are memory
>> > mapped index files using the key, so key choice would come in to play for
>> > memory requirements here.  For secondary indexes, it has to be a factor
>> for
>> > memory requirements- halving the size of the data you need to get in
>> memory
>> > must be a good thing.  I am also building Lucene indexes storing only
>> this
>> > key, so it influences their size a fair amount too.
>> >
>> > I know for sure Mysql (Myisam) btree index size is greatly affected by
>> the
>> > size of the Numeric types.  They are more complicated that my
>> understanding
>> > of HBase indexing, but the same principles apply (if it ain't in memory
>> then
>> > you're into disk seeking).
>> >
>> >
>> >
>> > On Sun, Mar 14, 2010 at 6:41 PM, Michael Segel <
>> michael_segel@hotmail.com>wrote:
>> >
>> > >
>> > >
>> > > UUID overkill?
>> > > Uhm uuid is a 128bit key. That's what 16 bytes in length? Definitely
>> not
>> > > 'overkill' if all you want the key to do is to guarantee uniqueness.
>> > >
>> > > Very easy to generate and extremely easy to use. You can even hash it
>> and
>> > > create version 5 UUIDs.
>> > >
>> > > I don't understand why you'd want to try and generate an 8 byte (you
>> said 8
>> > > character, assuming you meant latin-1 characterset), when you have a
>> > > standard way of doing it already. 8 byte vs 16 byte? C'mon....really?
>> > >
>> > > JMHO
>> > >
>> > > -Mike
>> > >
>> > > > Date: Sat, 13 Mar 2010 09:01:38 +0100
>> > > > Subject: Re: worth choosing the shortest possible column names/keys?
>> > > > From: timrobertson100@gmail.com
>> > > > To: hbase-user@hadoop.apache.org
>> > > >
>> > > > Along similar lines... (sorry for hijacking thread)
>> > > >
>> > > > I assume that this is even more applicable for key choice given the
>> way
>> > > keys
>> > > > participate in indexes?  I have been using UUID, but it is way
>> overkill
>> > > for
>> > > > my needs.  What are others using?  Is there convenient way of doing
>> > > (e.g.) 8
>> > > > characters strings?
>> > > >
>> > >
>> > >
>> > > _________________________________________________________________
>> > > Hotmail: Trusted email with Microsoft’s powerful SPAM protection.
>> > > http://clk.atdmt.com/GBL/go/210850552/direct/01/
>> > >
>>
>> _________________________________________________________________
>> Hotmail is redefining busy with tools for the New Busy. Get more from your
>> inbox.
>>
>> http://www.windowslive.com/campaign/thenewbusy?ocid=PID27925::T:WLMTAGL:ON:WL:en-US:WM_HMP:032010_2
>>
>

Mime
View raw message