hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Robertson <timrobertson...@gmail.com>
Subject Re: UUID as key wuz: RE: worth choosing the shortest possible column names/keys?
Date Mon, 15 Mar 2010 08:21:28 GMT
Thanks Ryan, sounds ideal

How do you use
http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue(byte[],%20byte[],%20byte[],%20long)

To generate a row key please?

Thanks
Tim





On Mon, Mar 15, 2010 at 9:12 AM, Ryan Rawson <ryanobjc@gmail.com> wrote:

> You can use incrementColumnValue to generate sequential numbers.  The
> call is atomic and fast.  It supports thousands of calls/second in my
> testing.
>
> -ryan
>
> On Mon, Mar 15, 2010 at 12:15 AM, Tim Robertson
> <timrobertson100@gmail.com> wrote:
> >>
> >> Maybe I'm missing something but the UUID is an artificial key, its used
> to
> >> guarantee uniqueness and in this case you're using it as part of a
> key,value
> >> pair.
> >>
> >
> > Sure, understood.  UUID aims to be globally unique, whereas I am only
> > looking for in cluster uniqueness across a couple billion items, but an
> > algorithm that allows ID minting by machines in parallel.
> >
> >
> >> So why are you storing it in a Lucene index as the value?
> >>
> >
> > Because I have various search indexes to the row using combinations of
> > fields from the row.  I want the whole row accessible in the search
> results,
> > so I store the row key only (the row content is way to big for Lucene).
> >  Lucene handles the search providing the Keys, and then the rows are
> pulled
> > and transformed while streaming out in the results.
> >
> >
> >> Look, the benefits of using the UUID definitely outweigh wrapping your
> own
> >> solution in 8bytes, even in memory caches.
> >> (Are you only storing values that are 16 bytes in length, or something
> much
> >> larger?)
> >
> >
> > The values are much much larger (100s - 1000s bytes) but they aren't
> going
> > in to any in-memory structures.
> >
> >
> >
> >> > Date: Sun, 14 Mar 2010 19:09:48 +0100
> >> > Subject: Re: UUID as key wuz: RE: worth choosing the shortest possible
> >> column         names/keys?
> >> > From: timrobertson100@gmail.com
> >> > To: hbase-user@hadoop.apache.org
> >> >
> >> > Well I could well be wrong, but my understanding is that there are
> memory
> >> > mapped index files using the key, so key choice would come in to play
> for
> >> > memory requirements here.  For secondary indexes, it has to be a
> factor
> >> for
> >> > memory requirements- halving the size of the data you need to get in
> >> memory
> >> > must be a good thing.  I am also building Lucene indexes storing only
> >> this
> >> > key, so it influences their size a fair amount too.
> >> >
> >> > I know for sure Mysql (Myisam) btree index size is greatly affected by
> >> the
> >> > size of the Numeric types.  They are more complicated that my
> >> understanding
> >> > of HBase indexing, but the same principles apply (if it ain't in
> memory
> >> then
> >> > you're into disk seeking).
> >> >
> >> >
> >> >
> >> > On Sun, Mar 14, 2010 at 6:41 PM, Michael Segel <
> >> michael_segel@hotmail.com>wrote:
> >> >
> >> > >
> >> > >
> >> > > UUID overkill?
> >> > > Uhm uuid is a 128bit key. That's what 16 bytes in length? Definitely
> >> not
> >> > > 'overkill' if all you want the key to do is to guarantee uniqueness.
> >> > >
> >> > > Very easy to generate and extremely easy to use. You can even hash
> it
> >> and
> >> > > create version 5 UUIDs.
> >> > >
> >> > > I don't understand why you'd want to try and generate an 8 byte (you
> >> said 8
> >> > > character, assuming you meant latin-1 characterset), when you have
a
> >> > > standard way of doing it already. 8 byte vs 16 byte?
> C'mon....really?
> >> > >
> >> > > JMHO
> >> > >
> >> > > -Mike
> >> > >
> >> > > > Date: Sat, 13 Mar 2010 09:01:38 +0100
> >> > > > Subject: Re: worth choosing the shortest possible column
> names/keys?
> >> > > > From: timrobertson100@gmail.com
> >> > > > To: hbase-user@hadoop.apache.org
> >> > > >
> >> > > > Along similar lines... (sorry for hijacking thread)
> >> > > >
> >> > > > I assume that this is even more applicable for key choice given
> the
> >> way
> >> > > keys
> >> > > > participate in indexes?  I have been using UUID, but it is way
> >> overkill
> >> > > for
> >> > > > my needs.  What are others using?  Is there convenient way of
> doing
> >> > > (e.g.) 8
> >> > > > characters strings?
> >> > > >
> >> > >
> >> > >
> >> > > _________________________________________________________________
> >> > > Hotmail: Trusted email with Microsoft’s powerful SPAM protection.
> >> > > http://clk.atdmt.com/GBL/go/210850552/direct/01/
> >> > >
> >>
> >> _________________________________________________________________
> >> Hotmail is redefining busy with tools for the New Busy. Get more from
> your
> >> inbox.
> >>
> >>
> http://www.windowslive.com/campaign/thenewbusy?ocid=PID27925::T:WLMTAGL:ON:WL:en-US:WM_HMP:032010_2
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message