hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: many tables vs long rows
Date Tue, 03 Jan 2012 21:37:51 GMT
On Tue, Jan 3, 2012 at 12:38 PM, Joe Stein
<charmalloc@allthingshadoop.com> wrote:
> when the event happened so if we see something from November 3rd today then
> we will only keep it for 4 more months (and for events that we see today
> those stay for 6 months) .  so it sounds like this might be a viable option
> and when we set the timestamp in our checkAndPut we make the timestamp be
> the value that represents it as November 3rd, right? cool
>

This should be fine.

You might want to protect against future dates.

> well what i was thinking is that my client code would know to use the
> november table and put the data in the november table (it is all just
> strings) but I am leaning now towards the TTL option (need to futz with it
> all more though).  One issue/concern with TTL is when all of a sudden we
> want to keep things for only 4 months or maybe 8 months and then having to
> re-TTL trillions of rows =8^( (which is nagging thought in the back of my
> head about ttls, requirements change)....

This schema attribute is kept at the table level, not per row.  You'll
have to change the table schema which in 0.90.x hbase means offlining
table (in 0.92 hbase, there is an online schema edit but needs to be
enabled and can be problematic in the face of splitting.... more on
this later).

> That makes sense why a narrow long schema works well then, got it (I am use
> to Cassandra and do lots of wide column range slices on those columns this
> is like inverting everything up on myself but the row locks and checkAndPut
> (and co-processors) hit so many of my uses cases (as Cassandra still does
> also)
>

Be careful using hbase row locks.  They are (unofficially -- we need
to make it official) deprecated.  You can lock yourself out of a
regionserver if all incoming handlers end up waiting on a particular
row lock to clear.  Check back in this mailing list for other rowlock
downsides.

You can column range slices in hbase if you use filters (if you need to).

checkAndPut shouldn't care if row is wide or not?


> right now I am on 0.90.4 but right now I am going back and forth in
> changing our hadoop cluster, HBase is the primary driver for that so I am
> currently wrestling on the decision with upgrading from existing cluster
> CDH2 to CDH3 or going with MapR ...

Go to CDH3 if you are on CDH2.  Does CDH2 have a working sync?
(CDH3u3 when it arrives has some nice perf improvements).

> my preference is to run my own version
> of HBase (like I do with Kafka and Cassandra) I feel I can do this though I
> am not comfortable with running my own Hadoop build (already overloaded
> with things).  0.92 is exciting for co-processors too and it is cool system
> to hack on, maybe I will pull from trunk build and test it out some too.
>

Don't do hbase trunk.  Do tip of 0.92 branch if you want to hack.
HBase Trunk is different from 0.92 already and will get even more
"differenter"; it'll be hard to help you if you are pulling from trunk

St.Ack

Mime
View raw message