hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Hsieh <...@cloudera.com>
Subject HBase scanner semantics and inconsistencies.
Date Mon, 10 Oct 2011 19:52:41 GMT
I've working a problem related to ACID rules on scans as defined here

In the two scenarios no new rowkeys are written -- all writes are
"overwrites" of already existing rows.  So lets say the table has 3 rows:
A1, B1, C1; and overwrites rows with values A2,B2,C2 respectively.   I'd
expect all the scans would have the same number of elements (3) but with
potentially any combination of A1|A2, B1|B2, C1|C2.  (since no acid
guarantees across rows).

Scenario 1: I have an MR job that does a filtered scan (confirmed this
problem happens without filter as well)  from a table, takes the row and
writes back to the same row and the same table.   I then run the job 2-3x
concurrently on the same table with the same filter.  I believe both should
return the same number of elements read.  It seems that in the case of a
multiple column family table, this is not the case -- in particular
sometimes MR counters reports that it had *fewer* than than the expected
number of input records (but never more).  This seems wrong.  Agree?

Scenario 2: When  trying to duplicate the problem excluding the MR portions,
I wrote two programs -- one that does a filtered scan and that overwrites of
existing rows, and one that just does the same filtered scan that just
counts the number rows read.   I've actually also used the
TableRecordReaderImpl code that the MR TableInputFormat uses.  In this case,
sometimes the scan/counting job actually returns *too many* entries -- for
some rowkeys return two records.  (but never fewer).  This should probably
not  happen as well.  Agree?

I've observed this on 0.90.1+patches and 0.90.3+patches.  It was also
claimed that this issue was not seen in a 0.89.x based hbase.  Thus far I've
been able to reproduce this on  multiple-column-family tables.

I believe scenario 1's problem is related to these.  Concur?

I think scenario 2 is probably related but not sure if it is the same issue.

Are there other related JIRA's?

Any hints for where to hunt this down? (I'm starting to go into the scan and
write paths on the RegionServer, but this is a fairly large area...


// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message