hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Clint Morgan <clint.mor...@troove.net>
Subject Re: IndexedTable puts removing index rows for updated timestamped values?
Date Mon, 29 Mar 2010 16:25:39 GMT
Def not the expected behavior and does not sound like user error..
Quick skim looks likk its

Hbase does not gracefully handle the case where a put after a delete
both have the same millisecond timestamp. Indexing table contrib was
using this pattern to maintain indexes. Above jira works around it.

NOTE: Current patch has a bug in it where if you delete only an
"additionalColumn" in the base table, then it does not get deleted in
the index. I'll put a fix for that up shortly.

On Mon, Mar 29, 2010 at 7:52 AM, George Stathis <gstathis@gmail.com> wrote:
> Hey folks,
> I hope this is just user error but I wanted to see if folks have encountered
> this scenario using IndexedTable. We followed the well known by now article
> on how to set up secondary indexes (
> http://rajeev1982.blogspot.com/2009/06/secondary-indexes-in-hbase.html) .
> Works OK on the first test inserts but we are noticing an unexpected
> behavior that I'll try to illustrate with the following example:
> - Assume a table 'foo' with a column family 'bar', a generic
> qualifier 'bar:myColumn' and an indexed qualifier 'bar:myIndex'
> - We thus have two actual tables, 'foo' and 'foo-myIndex'
> - Assume a single put on table 'foo' that produces one row in 'foo' and one
> index row in 'foo-myIndex'.
> - Assume a second put on table 'foo' for the same row as above that updates
> the qualifier 'bar:myColumn' but leaves 'bar:myIndex' as is. Both values get
> updated timestamps.
> - We are noticing that around 50% of the time this scenario is executed, the
> index row in 'foo-myIndex' disappears even though 'bar:myIndex' value was
> not changed. Again, this behavior is not reproduced reliably, it takes
> several attempts to see it.
> - We are noticing that if we submit the second put without the 'foo-myIndex'
> cell, the 'foo-myIndex' will be left alone.
> We are seeing this happening even if we extract 'bar:myIndex' to a new
> column family 'bar2:myIndex' that only allows one version per cell. So
> basically, if the timestamp changes, there is a risk of losing the index
> entry, regardless of whether the cell value was changed. Is this expected
> behavior?
> -GS

View raw message