hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Holstad <erikhols...@gmail.com>
Subject Re: Uses cases for checkAndSave?
Date Wed, 03 Jun 2009 17:08:55 GMT
On Tue, Jun 2, 2009 at 4:51 PM, Guilherme Germoglio <germoglio@gmail.com>wrote:

> Hello!
>
> On Tue, Jun 2, 2009 at 3:58 PM, Erik Holstad <erikholstad@gmail.com>
> wrote:
>
> > Hi!
> >
> > On Tue, Jun 2, 2009 at 11:17 AM, Guilherme Germoglio <
> germoglio@gmail.com
> > >wrote:
> >
> > > Hi Erik,
> > >
> > > For now, I'm using checkAndSave in order to make sure that a row is
> only
> > > created but not overwritten by multiple threads. So, checkAndSave is
> > mostly
> > > invoked with a new structure created on the client. Actually, I'm
> > checking
> > > if a specific "deleted" column in empty. If the "deleted" column is not
> > > empty, then the row creation cannot be performed. There are another few
> > > tricky cases I'm using it, but I'm sure that making that Result object
> > more
> > > difficult to create than putting values on a map would be bad for me.
> :-)
> >
> > So you have a row with family and qualifier that you check to see if it
> is
> > empty
> > and if it is you insert a new row? So basically you use it as an atomic
> > rowExist
> > checker or? Are you usually batching this checks or would it be ok with
> > something like:
> >
> > public boolean checkAndPut(byte[] row, byte[] family, byte[] qualifier,
> > byte[] value, Put put){}
> > or
> > public boolean checkAndPut(KeyValue checkKv, Put put){}
> > for now?
> >
>
> Yes. It is ok for me to use the methods above for now.


Sweet, will make a version today, so you can test it out and maybe after
that we can work
on it together to make things work for you.

>
>
> Just in case you are curious on how I'll be using them, there are two cases
> where I'm using checkAndSave:
>
> The first is like the atomic rowExist checker and it represents 90% of the
> use of checkAndSave. Exactly as you said, I've got a column
> attributes:deleted for every row. When creating a new row, the creation
> only
> happens if this column is empty. When the row creation happens, it is
> assigned a 'false' value to this column. When this column receives a 'true'
> value, that is, the row is to be deleted, the 'hard' removal (a HTable's
> Delete) of the row will be performed asynchronously. Until the 'hard'
> removal happens, a software layer that uses HTable will prevent the use of
> any 'soft' deleted row by checking the attributes:deleted column.
>
> The second case of using checkAndSave is to trigger some actions when a
> specific column is updated. So, I don't check for emptiness, but if a
> previous value continues the same when I'm updating the row. For example,
> let's say I have a users table where I will serialize a User object and put
> it into a row. Among other things, the User object contains an e-mail
> attribute and its change must trigger verification actions, changes on
> other
> tables, whatever. I realized that performing a get for every User update
> just to check whether their e-mail changed or not might not be the better
> approach, since changing e-mail is not a very common operation. So, I
> thought it is better to checkAndSave an user expecting their current e-mail
> value will be the same the one already in the table since this will occur
> many many times more than the opposite. However, if it is the case that the
> current e-mail value is different from the one in the table, triggers are
> fired and then a new update is performed.
>
>
>
> >
> > >
> > > However, here's an idea. What if Put and Delete objects have a field
> > > "condition" (maybe, "onlyIf" would be a better name) which is exactly
> the
> > > map with columns and expected values. So, a given Put or Delete of an
> > > updates list will only happen if those expected values match.
> > >
> >
> > Puts and deletes are pretty much just List<KeyValue> which is basically a
> > List<byte[]>.
> > I don't think that we want to add complexity for puts and deletes now
> that
> > we have worked
> > so hard to make it faster and more bare bone.
> >
>
> no problem. (sorry!)
>
You don't have to be sorry, just happy that we are going to have a faster
HBase soon :)


>
>
> >
> >
> > > Also, maybe it should be possible to indicate common expected values
> for
> > > all
> > > updates of a list too, so a client won't have to put in all updates the
> > > same
> > > values if needed. But we must remember to solve the conflicts of
> expected
> > > values.
> > >
> > Not really sure if you mean that we would check the value of a key before
> > inserting the new
> > value? That would mean that you would have to do a get for every
> put/delete
> > which is not
> > something we want in the general case.
> >
> >
> > >
> > > (By the way, I haven't seen the guts of new Puts and Deletes, so I
> don't
> > > know how difficult would it be to implement it -- but I can help, if
> > > necessary)
> > >
> > > Thanks,
> > >
> > > On Tue, Jun 2, 2009 at 2:34 PM, Erik Holstad <erikholstad@gmail.com>
> > > wrote:
> > >
> > > > Hi!
> > > > I'm working on putting checkAndSave back into 0.20 and just want to
> > check
> > > > with the people that are using it how they are using it
> > > > so that I can make it as good as possible for these users.
> > > >
> > > > Since the API has changed from earlier versions there are some things
> > > that
> > > > one need to think about.
> > > > For now in the new API there are now Updates, just Put and Delete, so
> > for
> > > > now I need to know if users used to delete in the old batchUpdate
> > > > or just put?
> > > >
> > > > The new return format Result might seem like a good way to send in
> the
> > > data
> > > > to be used as "actual", but there is no super easy way to build that
> > > > on the client side for now, so would be good to know how you are
> doing
> > > > this.
> > > > If you do a get, save the result and then use it for the check or if
> > you
> > > > just create new structures on the client?
> > > >
> > > > Regards Erik
> > > >
> > >
> > >
> > >
> > > --
> > > Guilherme
> > >
> > > msn: guigermoglio@hotmail.com
> > > homepage: http://germoglio.googlepages.com
> > >
> >
> > Regards Erik
> >
>
>
>
> --
> Guilherme
>
> msn: guigermoglio@hotmail.com
> homepage: http://germoglio.googlepages.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message