hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Beaudreault <bbeaudrea...@hubspot.com>
Subject Re: Practical Upper Limit on Number of Version Stored?
Date Fri, 06 Dec 2013 02:28:09 GMT
I generally agree with Michael and avoid using versions for anything other
than versioning, but mostly out of personal preference.  That said, I also
agree with JM that 50-200 should be no problem at all.

We did do this in our early days of HBase, and eventually moved away from
it for a few reasons:

1) You get more control without it (we didn't in the beginning but
eventually wanted to do deletes, updates, and other things)

2) Once we had a case where someone got spammed for events, creating
thousands upon thousands of versions.  The excess versions only get cleaned
up on major compaction.  This row eventually grew to the size of a region
and became an operational nightmare, because you can't split inside the row
boundary and other bugs around this at the time (~2 years ago).

3) What do you if for some reason a user has 2 events in the same
millisecond.  This generally doesn't happen or is an error when it does,
but it's nice to expose it or otherwise be able to handle it.  (Note we
initially got around this by munging the version timestamp a bit so we
could include a hashCode in it ... this was an ugly hack).

4) As we wrote more and more hbase code, and most hbase code does not work
in this manner, it became nicer to unify around more normal access patterns
(this is probably mostly preference).


On Thu, Dec 5, 2013 at 8:29 PM, Jean-Marc Spaggiari <jean-marc@spaggiari.org
> wrote:

> And the respons is no.
>
> You don't have that much version. Up to 200 is not critical.
>
> Also you can easily give that a try.
>
> JM
> Le 2013-12-05 20:27, "Shawn Hermans" <shawnhermans@gmail.com> a écrit :
>
> > I guess I don't really understand why I wouldn't want to do this.  For
> our
> > use case we only really care about the user's last 50 to 200 events.  We
> > don't really care about deleting events explicitly.  More than likely we
> > would enable a TTL to get rid of events older than a certain time.
> >
> >
> >
> >
> > I guess my question is whether or not there is an issue with storing this
> > many versions.  Are there any measurable drawbacks?
> >
> > —
> > Sent from Mailbox for iPhone
> >
> > On Thu, Dec 5, 2013 at 7:11 PM, Michael Segel <michael_segel@hotmail.com
> >
> > wrote:
> >
> > > You really don't want to do this.
> > > Its not what the versioning was meant for and it has a couple of
> serious
> > flaws.
> > > The biggest flaw... what happens when you want to delete a version? ...
> > > There are other options... depending on your use case and how you use
> > the events.
> > > Truly using versioning beyond versions of the same data.. not a good
> > idea.
> > > On Dec 5, 2013, at 4:47 PM, Shawn Hermans <shawnhermans@gmail.com>
> > wrote:
> > >> All,
> > >> I am working on an HBase application where we store user events in an
> > HBase
> > >> table.  The row key is the a user identifier and each column is an
> event
> > >> identifier.  Most users only have a handful of events (10 or less),
> but
> > >> some users have a few hundred thousand events or more and this causes
> > >> issues when an HBase client tries to retrieve all those events.
> > >>
> > >> We are looking at different ways of limiting then number events
> > returned.
> > >> One idea is to store each event using its own column qualifier, but
> > >> instead use HBase's versioning capability to store the last 100 to 200
> > >> events. It doesn't seem like we would run into issues with this
> > approach,
> > >> but I want to see if anyone has had any practical experience in this
> > area.
> > >> The advice given in
> http://hbase.apache.org/book/schema.versions.htmlis a
> > >> little ambiguous.
> > >>
> > >> Thanks,
> > >> Shawn
> > > The opinions expressed here are mine, while they may reflect a
> cognitive
> > thought, that is purely accidental.
> > > Use at your own risk.
> > > Michael Segel
> > > michael_segel (AT) hotmail.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message