hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Gray <jg...@facebook.com>
Subject RE: Newbie question
Date Mon, 15 Nov 2010 10:29:21 GMT
> Thank you for the feedback. So to summarize, HBase is doing good for
> high
> reads, writes. Update is really writing a new version of the data. So
> updating is okay but Handling deletes is not possible in the current
> version
> of the data unless a new version of the data is written down.

Deletes are supported (you can delete all of a row, all of a column, or specific versions
of columns).

They are really tombstones / markers, so the data does actually still sit on disk for some
time, but HBase will never return it back to you once it is marked as deleted.  In the background
and over time, HBase will eventually evict all of the deleted data.


> Also, I was reading some documentation to figure out if there is a way
> to
> store and get column values in a sorted manner.
> I understand It is possible to do range queries on key (as the key is
> sorted
> and stored) but it looks like its not straight forward to do the same
> on the
> columns values. For example I have a set of column values with a name
> and a
> score and for a given key and i want to retrieve the column names for a
> given key sorted by the score. From my understanding so far, this has
> to be
> handled at the application end. Please let me know if I am missing
> something
> here.

You're not missing something.  HBase tables are sorted by row, each row is sorted by columns,
each column is sorted by versions.  There is no sorting on values.

You would either have to read all the values and do the sorting in the client (sometimes this
makes sense but if you have 1M columns it probably doesn't).  The other way would be to create
more tables.  A table can be used to create a different index on your data (the value would
now be the row key, so the table would be sorted by value, for example).

Hope that helps.

JG




> 
> Thanks,
> Gayatri
> 
> On Mon, Nov 15, 2010 at 10:29 AM, Ryan Rawson <ryanobjc@gmail.com>
> wrote:
> 
> > That is a static snapshot of a particular version of HBase with a
> > particular version of their code (each with various flaws, mistakes,
> > etc, etc).
> >
> > At this moment, Stumbleupon uses HBase behind parts of it's website,
> > doing reads, writes, updates, and so on.  Performance is quite good,
> > and we are very happy with HBase.
> >
> > -ryan
> >
> > On Sun, Nov 14, 2010 at 8:54 PM, Hari Sreekumar
> > <hsreekumar@clickable.com> wrote:
> > > Hi,
> > >   I read the comparison from this pdf:
> > >   http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf
> > >
> > > hari
> > >
> > >
> > > On Mon, Nov 15, 2010 at 4:20 AM, Jonathan Gray <jgray@facebook.com>
> > wrote:
> > >
> > >> HBase is well-suited for a high-write workload.
> > >>
> > >> Hari, I'm not sure what would be different in a database like
> Cassandra
> > >> with respect to updates and deletes?  In this regard HBase and
> Cassandra
> > are
> > >> nearly identical (updates are really just insertions of new
> versions,
> > >> deletions are actually tombstone markers... ie data is immutable
> once
> > >> written).
> > >>
> > >> JG
> > >>
> > >> > -----Original Message-----
> > >> > From: Hari Sreekumar [mailto:hsreekumar@clickable.com]
> > >> > Sent: Friday, November 12, 2010 6:21 AM
> > >> > To: user@hbase.apache.org
> > >> > Subject: Re: Newbie question
> > >> >
> > >> > Hi Gayatri,
> > >> >
> > >> >              I am myself quite new to hbase but from my little
> > >> > experience
> > >> > and from whatever I have read, HBase is more suitable for
> environments
> > >> > with
> > >> > high read and write, but very few updates and no real deletions.
> It is
> > >> > more
> > >> > of a write once and forget kind of database. Cassandra or
> MongoDB
> > might
> > >> > be
> > >> > more suitable for your requirement imo. My advice would be to
> consider
> > >> > those
> > >> > as well before making any decision.
> > >> >
> > >> > thanks,
> > >> > hari
> > >> >
> > >> > On Fri, Nov 12, 2010 at 7:00 PM, Gayatri Rao
> <rgayatri1@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > Hi All,
> > >> > >
> > >> > > I am new to hbase. I have been reading up documentation and
> studying
> > >> > how
> > >> > > hbase suits to our requirement.
> > >> > >
> > >> > > We want to be able to store a key and corresponding values.
> However,
> > >> > while
> > >> > > reading, i want to read values in sorted order something like
> the
> > >> > topN. Its
> > >> > > a web facing environment and our requirement is write heavy
> infact
> > >> > they are
> > >> > > updates of the already existing records (about 270K updates in
> an
> > >> > hour
> > >> > > though actual data that needs to be stored in it might be much
> much
> > >> > more).
> > >> > > Deletes would be in the order of a few thousands every day.
> > >> > >
> > >> > > I wanted to find out know your opinions on how good is hbase
> for
> > this
> > >> > kind
> > >> > > of scenario.
> > >> > >
> > >> > > Thanks,
> > >> > > Gayatri
> > >> > >
> > >>
> > >
> >

Mime
View raw message