hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghava Mutharaju <m.vijayaragh...@gmail.com>
Subject Re: question on Filtering and checkAndPut()
Date Tue, 01 Jun 2010 17:49:26 GMT
Hi JG,

   There cannot be duplicate insertions because in my case, a row represents
a set and the qualifier values represent each element of the set. So
whenever I insert a value, I have to check whether the value already exists.
A new values goes under a new qualifier. Do you think this is an appropriate
schema design?

Regards,
Raghava.

On Tue, Jun 1, 2010 at 1:12 PM, Jonathan Gray <jgray@facebook.com> wrote:

> Do you expect a very high percentage to be duplicates or just some?
>
> An alternate approach is to just perform the insertions.  Writes are faster
> than reads, so sometimes it's best to just insert.  This will create an
> additional version but if you aren't relying on versions then will have
> little impact.
>
> If a majority of stuff will be duplicate, then maybe consider something
> different.  Just remember that requiring reads before each write is going to
> significantly slow everything down.
>
> > -----Original Message-----
> > From: Raghava Mutharaju [mailto:m.vijayaraghava@gmail.com]
> > Sent: Tuesday, June 01, 2010 9:49 AM
> > To: user@hbase.apache.org
> > Subject: Re: question on Filtering and checkAndPut()
> >
> > Thank you JG.
> >
> > >>> is checking if the values there are the same as the ones you are
> > trying
> > to insert?
> >         Yes, that is right. I am doing this because there could be
> > duplicate
> > values generated. In the current iteration of MR, I could generate a
> > value
> > which was already present in that row/qualifier combination (it is
> > sufficient if the value be in any column).
> >
> > Regards,
> > Raghava.
> >
> > On Tue, Jun 1, 2010 at 12:32 PM, Jonathan Gray <jgray@facebook.com>
> > wrote:
> >
> > > And for checkAndPut, from the javadoc:
> > >
> > > "Atomically checks if a row/family/qualifier value match the
> > expectedValue.
> > > If it does, it adds the put."
> > >
> > > This can be used a number of ways.  It sounds like what you're
> > describing
> > > is checking if the values there are the same as the ones you are
> > trying to
> > > insert?  This wouldn't make much sense, why would you re-insert the
> > same
> > > value?  You specify a row, family, qualifier, and value.  You also
> > specify a
> > > Put.
> > >
> > > checkAndPut is an example of an atomic operation.  I may want to only
> > > insert certain data if the value I expect is there at the time I am
> > > inserting.  Think about updating account balances, state transitions,
> > data
> > > processing, etc.  You may read some data at an earlier point in time,
> > do
> > > some processing, and then insert.  When you do the insert, you only
> > want it
> > > to happen if something else hasn't gone in during your process time
> > and
> > > modified the data that was there.
> > >
> > > JG
> > >
> > > > -----Original Message-----
> > > > From: Raghava Mutharaju [mailto:m.vijayaraghava@gmail.com]
> > > > Sent: Tuesday, June 01, 2010 1:47 AM
> > > > To: user@hbase.apache.org
> > > > Cc: hbase-user@hadoop.apache.org
> > > > Subject: question on Filtering and checkAndPut()
> > > >
> > > > Hi all,
> > > >
> > > >      Can the following type of value filter be possible -- Within a
> > > > row,
> > > > irrespective of the columns (qualifiers), the presence of a value
> > > > should be
> > > > checked. If that value is present then the row along with all the
> > > > columns
> > > > should be fetched.
> > > >
> > > > SingleColumnValueFilter requires the we specify the name of the
> > > > qualifier
> > > > but here I would like to check the value across all the qualifiers
> > of
> > > > the
> > > > row. ValueFilter can be used but it does not return all the columns
> > if
> > > > there
> > > > is a match - it only returns the matched column along with the row.
> > So
> > > > I
> > > > want something which is a mix of both. Is this possible?
> > > >
> > > > Can someone please explain the functionality of checkAndPut()
> > method in
> > > > HTable? I couldn't get it from the api doc. When I came across this
> > > > method,
> > > > my guess was that it would check for duplicate values -- for the
> > given
> > > > (row,
> > > > family, qualifier) combination whether the given value is same as
> > the
> > > > value
> > > > mentioned in put (for the same combination).
> > > >
> > > > Thank you.
> > > >
> > > > Regards,
> > > > Raghava.
> > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message