hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: Data uniqueness
Date Thu, 05 Mar 2009 09:07:04 GMT
The only method of determining uniqueness of data in general in hbase is via
the row key.  Just like a primary key in a database, you can use it to
verify uniqueness, and do index scans and gets.

So generally speaking, yes you will have to make multiple trips to the
server to use a secondary index.  The situation might not be as dire as it
seems, since in 0.20 the speed targets for small data gets/sets is really
low (like maybe 1 ms?).

The solution to "need to do more" for hbase is generally 'well use
map-reduce'... which is the solution i will offer you as well.

Hopefully this answers some of your questions.

Good luck!
-ryan

On Thu, Mar 5, 2009 at 1:00 AM, Eran Bergman <eranberghbase@gmail.com>wrote:

> Hello,
>
> Lately I have been experimenting with HBase and I came across a problem I
> don't know how to solve yet.
> My problem is data uniqueness, meaning I would like to have unique data in
> a
> specified column (taking into account all or some subset of my rows).
> I would like to have that for any number of columns which I will specify
> (various types of data).
>
> Usually the way to do this is to use some sort of indexing method, but this
> will amount to round trips to the server for uniqueness checks before I
> commit, which are very costly.
>
> Does anyone have any thoughts on how to do this?
>
>
> Thanks,
> Eran
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message