hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Gray <jg...@facebook.com>
Subject RE: question on Filtering and checkAndPut()
Date Tue, 01 Jun 2010 17:53:46 GMT
To implement a set, you would need to do a check.

For your application, can you explain more specifically what the behavior is when you attempt
to insert a duplicate into the set?

> -----Original Message-----
> From: Raghava Mutharaju [mailto:m.vijayaraghava@gmail.com]
> Sent: Tuesday, June 01, 2010 10:49 AM
> To: user@hbase.apache.org
> Subject: Re: question on Filtering and checkAndPut()
> 
> Hi JG,
> 
>    There cannot be duplicate insertions because in my case, a row
> represents
> a set and the qualifier values represent each element of the set. So
> whenever I insert a value, I have to check whether the value already
> exists.
> A new values goes under a new qualifier. Do you think this is an
> appropriate
> schema design?
> 
> Regards,
> Raghava.
> 
> On Tue, Jun 1, 2010 at 1:12 PM, Jonathan Gray <jgray@facebook.com>
> wrote:
> 
> > Do you expect a very high percentage to be duplicates or just some?
> >
> > An alternate approach is to just perform the insertions.  Writes are
> faster
> > than reads, so sometimes it's best to just insert.  This will create
> an
> > additional version but if you aren't relying on versions then will
> have
> > little impact.
> >
> > If a majority of stuff will be duplicate, then maybe consider
> something
> > different.  Just remember that requiring reads before each write is
> going to
> > significantly slow everything down.
> >
> > > -----Original Message-----
> > > From: Raghava Mutharaju [mailto:m.vijayaraghava@gmail.com]
> > > Sent: Tuesday, June 01, 2010 9:49 AM
> > > To: user@hbase.apache.org
> > > Subject: Re: question on Filtering and checkAndPut()
> > >
> > > Thank you JG.
> > >
> > > >>> is checking if the values there are the same as the ones you
> are
> > > trying
> > > to insert?
> > >         Yes, that is right. I am doing this because there could be
> > > duplicate
> > > values generated. In the current iteration of MR, I could generate
> a
> > > value
> > > which was already present in that row/qualifier combination (it is
> > > sufficient if the value be in any column).
> > >
> > > Regards,
> > > Raghava.
> > >
> > > On Tue, Jun 1, 2010 at 12:32 PM, Jonathan Gray <jgray@facebook.com>
> > > wrote:
> > >
> > > > And for checkAndPut, from the javadoc:
> > > >
> > > > "Atomically checks if a row/family/qualifier value match the
> > > expectedValue.
> > > > If it does, it adds the put."
> > > >
> > > > This can be used a number of ways.  It sounds like what you're
> > > describing
> > > > is checking if the values there are the same as the ones you are
> > > trying to
> > > > insert?  This wouldn't make much sense, why would you re-insert
> the
> > > same
> > > > value?  You specify a row, family, qualifier, and value.  You
> also
> > > specify a
> > > > Put.
> > > >
> > > > checkAndPut is an example of an atomic operation.  I may want to
> only
> > > > insert certain data if the value I expect is there at the time I
> am
> > > > inserting.  Think about updating account balances, state
> transitions,
> > > data
> > > > processing, etc.  You may read some data at an earlier point in
> time,
> > > do
> > > > some processing, and then insert.  When you do the insert, you
> only
> > > want it
> > > > to happen if something else hasn't gone in during your process
> time
> > > and
> > > > modified the data that was there.
> > > >
> > > > JG
> > > >
> > > > > -----Original Message-----
> > > > > From: Raghava Mutharaju [mailto:m.vijayaraghava@gmail.com]
> > > > > Sent: Tuesday, June 01, 2010 1:47 AM
> > > > > To: user@hbase.apache.org
> > > > > Cc: hbase-user@hadoop.apache.org
> > > > > Subject: question on Filtering and checkAndPut()
> > > > >
> > > > > Hi all,
> > > > >
> > > > >      Can the following type of value filter be possible --
> Within a
> > > > > row,
> > > > > irrespective of the columns (qualifiers), the presence of a
> value
> > > > > should be
> > > > > checked. If that value is present then the row along with all
> the
> > > > > columns
> > > > > should be fetched.
> > > > >
> > > > > SingleColumnValueFilter requires the we specify the name of the
> > > > > qualifier
> > > > > but here I would like to check the value across all the
> qualifiers
> > > of
> > > > > the
> > > > > row. ValueFilter can be used but it does not return all the
> columns
> > > if
> > > > > there
> > > > > is a match - it only returns the matched column along with the
> row.
> > > So
> > > > > I
> > > > > want something which is a mix of both. Is this possible?
> > > > >
> > > > > Can someone please explain the functionality of checkAndPut()
> > > method in
> > > > > HTable? I couldn't get it from the api doc. When I came across
> this
> > > > > method,
> > > > > my guess was that it would check for duplicate values -- for
> the
> > > given
> > > > > (row,
> > > > > family, qualifier) combination whether the given value is same
> as
> > > the
> > > > > value
> > > > > mentioned in put (for the same combination).
> > > > >
> > > > > Thank you.
> > > > >
> > > > > Regards,
> > > > > Raghava.
> > > >
> >

Mime
View raw message