hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From George Stathis <gstat...@gmail.com>
Subject Re: IndexedTable puts removing index rows for updated timestamped values?
Date Mon, 29 Mar 2010 17:57:52 GMT
That sounds about right. I'm assuming the delete/put index timestamp issue
lies within the IndexedTable put call and is not related to how the client
makes the call right? I'm asking because we suspected timestamp issues and
we tried to introduce a delay between the initial put and the second one but
we had the same results.

-GS

On Mon, Mar 29, 2010 at 12:25 PM, Clint Morgan <clint.morgan@troove.net>wrote:

> Def not the expected behavior and does not sound like user error..
> Quick skim looks likk its
> https://issues.apache.org/jira/browse/HBASE-2286.
>
> Hbase does not gracefully handle the case where a put after a delete
> both have the same millisecond timestamp. Indexing table contrib was
> using this pattern to maintain indexes. Above jira works around it.
>
> NOTE: Current patch has a bug in it where if you delete only an
> "additionalColumn" in the base table, then it does not get deleted in
> the index. I'll put a fix for that up shortly.
>
> On Mon, Mar 29, 2010 at 7:52 AM, George Stathis <gstathis@gmail.com>
> wrote:
> > Hey folks,
> >
> > I hope this is just user error but I wanted to see if folks have
> encountered
> > this scenario using IndexedTable. We followed the well known by now
> article
> > on how to set up secondary indexes (
> > http://rajeev1982.blogspot.com/2009/06/secondary-indexes-in-hbase.html)
> .
> > Works OK on the first test inserts but we are noticing an unexpected
> > behavior that I'll try to illustrate with the following example:
> >
> > - Assume a table 'foo' with a column family 'bar', a generic
> > qualifier 'bar:myColumn' and an indexed qualifier 'bar:myIndex'
> > - We thus have two actual tables, 'foo' and 'foo-myIndex'
> > - Assume a single put on table 'foo' that produces one row in 'foo' and
> one
> > index row in 'foo-myIndex'.
> > - Assume a second put on table 'foo' for the same row as above that
> updates
> > the qualifier 'bar:myColumn' but leaves 'bar:myIndex' as is. Both values
> get
> > updated timestamps.
> > - We are noticing that around 50% of the time this scenario is executed,
> the
> > index row in 'foo-myIndex' disappears even though 'bar:myIndex' value was
> > not changed. Again, this behavior is not reproduced reliably, it takes
> > several attempts to see it.
> > - We are noticing that if we submit the second put without the
> 'foo-myIndex'
> > cell, the 'foo-myIndex' will be left alone.
> >
> > We are seeing this happening even if we extract 'bar:myIndex' to a new
> > column family 'bar2:myIndex' that only allows one version per cell. So
> > basically, if the timestamp changes, there is a risk of losing the index
> > entry, regardless of whether the cell value was changed. Is this expected
> > behavior?
> >
> > -GS
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message