hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: scanner is returning everything in parent region plus one of the daughters?
Date Sun, 14 Jun 2009 16:59:26 GMT
Andrew,

+1 I think it's a great idea.

Building on that, I think we should have system-level tests to make
sure we don't break performance and reliability. For example, an
intensive and simultaneous read/write test of a couple of millions of
rows. We could even think of killing a region server or two during
that test (and a master of course). Currently, I don't think it's
easily doable on Hudson so someone would have to host it on a small
cluster.

J-D

On Sun, Jun 14, 2009 at 12:52 PM, Andrew Purtell<apurtell@apache.org> wrote:
> This possibly belongs in one of the new existing/open issues put up over the
> past few days:
>
> Insert 1000 rows with random row keys, and induce a split (see test.rb
> attached to HBASE-1500). I would expect that no more than 1000 rows should
> be returned from a row count. However, the following is a series of row
> counts obtained after running the test, with total reinitialization in
> between, 5 times:
>
>    1516
>    1492
>    1497
>    1509
>    1501
>
> Also the shell provides an additional clue:
>
>    Current count: 1000, row: ffdcee2a75742697b375edef62fa4b75
>
>    1516 row(s) in 2.9530 seconds
>
> Looks like the parent region is fully iterated first, then in addition
> one of the daughters?
>
> Also, as these issues come up, kindly consider adding test cases to the
> test suite to catch these regressions. It seems the current coverage for
> scanners is letting big issues pass unnoticed.
>
> One thing we could do right away is commit my 'test.rb' reimplemented
> as Java/JUnit into the suite, with some additional logic to test that
> the scanners return the count of unique row keys inserted. If no -1 I
> will go ahead and do that.
>
>  - Andy
>
>
>

Mime
View raw message