hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Hsieh <...@cloudera.com>
Subject Re: HBase scanner semantics and inconsistencies.
Date Mon, 17 Oct 2011 17:06:02 GMT
Just to wrap this up -- HBASE-4570 was filed and more details are there.
There is also a small patch that addresses byt symptom described in this
scenario.  It  ended up being a race with rowname caching in the KV store.

This isn't related to flushes, which has become the focus of HBASE-4485 and
HBASE-2856's subtasks. I however should be part of making HBASE-2856
(TestAcidGuarantees) pass consistently.

I've tested it on trunk and a version of 0.90.x

Jon.

On Mon, Oct 10, 2011 at 5:03 PM, Stack <stack@duboce.net> wrote:

> Keep digging Jon.
> St.Ack
>
> On Mon, Oct 10, 2011 at 3:55 PM, Jonathan Hsieh <jon@cloudera.com> wrote:
> > Yup, I'm pretty convinced this is a consistency and isolation problem.
> >  Here's the output of my program, (expected 59 rows) and some output from
> > problem rows.
> >
> > The row has 10 column families ("f0","f1"... "f9") with two cols.  There
> are
> > actually two 'row0000021332' rows -- the 48 and the 49 versions.  Each
> came
> > from a separate calls to next().  Notice how they are split between
> column
> > families!   Over many runs this happens many times although "splitting"
> > between different column families.
> >
> > scan items counted: 59 with 0 null ColumnFamilyMaps and 0 nullRows
> > scan items counted: 59 with 0 null ColumnFamilyMaps and 0 nullRows
> > scan items counted: 59 with 0 null ColumnFamilyMaps and 0 nullRows
> > scan items counted: 60 with 0 null ColumnFamilyMaps and 0 nullRows
> > Row row0000021332 had time stamps: [48:
> > keyvalues={row0000021332/f0:data/1318200437341/Put/vlen=1000,
> > row0000021332/f0:qual/1318200437341/Put/vlen=10,
> > row0000021332/f1:data/1318200437341/Put/vlen=1000,
> > row0000021332/f1:qual/1318200437341/Put/vlen=10,
> > row0000021332/f2:data/1318200437341/Put/vlen=1000,
> > row0000021332/f2:qual/1318200437341/Put/vlen=10,
> > row0000021332/f3:data/1318200437341/Put/vlen=1000,
> > row0000021332/f3:qual/1318200437341/Put/vlen=10,
> > row0000021332/f4:data/1318200437341/Put/vlen=1000,
> > row0000021332/f4:qual/1318200437341/Put/vlen=10,
> > row0000021332/f5:data/1318200437341/Put/vlen=1000,
> > row0000021332/f5:qual/1318200437341/Put/vlen=10}, 49:
> > keyvalues={row0000021332/f6:data/1318200437341/Put/vlen=1000,
> > row0000021332/f6:qual/1318200437341/Put/vlen=10,
> > row0000021332/f7:data/1318200437341/Put/vlen=1000,
> > row0000021332/f7:qual/1318200437341/Put/vlen=10,
> > row0000021332/f8:data/1318200437341/Put/vlen=1000,
> > row0000021332/f8:qual/1318200437341/Put/vlen=10,
> > row0000021332/f9:data/1318200437341/Put/vlen=1000,
> > row0000021332/f9:qual/1318200437341/Put/vlen=10}]
> >
> > Jon.
> >
> > On Mon, Oct 10, 2011 at 1:50 PM, Jonathan Hsieh <jon@cloudera.com>
> wrote:
> >
> >> I've modified scenario 2's filter to setFilterIfMissing(true) which I
> >> missed in scenario 1 and now have an undercount like in that scenario.
> In
> >> another experiement, I found that getting the latest version
> >> Record.getColumnLatest(..)  would return null.
> >>
> >> This seems like a consistency+isolation violation of the expected
> row-acid
> >> properties.
> >>
> >> ----
> >> Scan s = new Scan();
> >> ...
> >> SingleColumnValueFilter filter = new SingleColumnValueFilter(
> >>         Bytes.toBytes("f1"), Bytes.toBytes("qual"), CompareOp.EQUAL,
> >> value);
> >>     filter.setFilterIfMissing(true); // this is what drops events.
> >>     s.setFilter(filter);
> >>  ---
> >>
> >> Jon.
> >>
> >> On Mon, Oct 10, 2011 at 12:52 PM, Jonathan Hsieh <jon@cloudera.com>
> wrote:
> >>
> >>> I've working a problem related to ACID rules on scans as defined here
> >>> http://hbase.apache.org/acid-semantics.html.
> >>>
> >>> In the two scenarios no new rowkeys are written -- all writes are
> >>> "overwrites" of already existing rows.  So lets say the table has 3
> rows:
> >>> A1, B1, C1; and overwrites rows with values A2,B2,C2 respectively.
> I'd
> >>> expect all the scans would have the same number of elements (3) but
> with
> >>> potentially any combination of A1|A2, B1|B2, C1|C2.  (since no acid
> >>> guarantees across rows).
> >>>
> >>> Scenario 1: I have an MR job that does a filtered scan (confirmed this
> >>> problem happens without filter as well)  from a table, takes the row
> and
> >>> writes back to the same row and the same table.   I then run the job
> 2-3x
> >>> concurrently on the same table with the same filter.  I believe both
> should
> >>> return the same number of elements read.  It seems that in the case of
> a
> >>> multiple column family table, this is not the case -- in particular
> >>> sometimes MR counters reports that it had *fewer* than than the
> expected
> >>> number of input records (but never more).  This seems wrong.  Agree?
> >>>
> >>> Scenario 2: When  trying to duplicate the problem excluding the MR
> >>> portions, I wrote two programs -- one that does a filtered scan and
> that
> >>> overwrites of existing rows, and one that just does the same filtered
> scan
> >>> that just counts the number rows read.   I've actually also used the
> >>> TableRecordReaderImpl code that the MR TableInputFormat uses.  In this
> case,
> >>> sometimes the scan/counting job actually returns *too many* entries --
> for
> >>> some rowkeys return two records.  (but never fewer).  This should
> probably
> >>> not  happen as well.  Agree?
> >>>
> >>> I've observed this on 0.90.1+patches and 0.90.3+patches.  It was also
> >>> claimed that this issue was not seen in a 0.89.x based hbase.  Thus far
> I've
> >>> been able to reproduce this on  multiple-column-family tables.
> >>>
> >>> I believe scenario 1's problem is related to these.  Concur?
> >>> https://issues.apache.org/jira/browse/HBASE-2856
> >>>  https://issues.apache.org/jira/browse/HBASE-3498
> >>>
> >>> I think scenario 2 is probably related but not sure if it is the same
> >>> issue.
> >>>
> >>> Are there other related JIRA's?
> >>>
> >>> Any hints for where to hunt this down? (I'm starting to go into the
> scan
> >>> and write paths on the RegionServer, but this is a fairly large area...
> >>>
> >>> Thanks
> >>> Jon.
> >>>
> >>> --
> >>> // Jonathan Hsieh (shay)
> >>> // Software Engineer, Cloudera
> >>> // jon@cloudera.com
> >>>
> >>>
> >>>
> >>
> >>
> >> --
> >> // Jonathan Hsieh (shay)
> >> // Software Engineer, Cloudera
> >> // jon@cloudera.com
> >>
> >>
> >>
> >
> >
> > --
> > // Jonathan Hsieh (shay)
> > // Software Engineer, Cloudera
> > // jon@cloudera.com
> >
>



-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message