hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Somogyi <psomo...@cloudera.com>
Subject Re: Problem with IntegrationTestRegionReplicaReplication
Date Thu, 15 Jun 2017 11:44:20 GMT
Thanks Josh and Devaraj!

I will try to increase the timeouts. Devaraj, could you share the
parameters you used for this test which worked?

On Thu, Jun 15, 2017 at 6:44 AM, Devaraj Das <ddas@hortonworks.com> wrote:

> That sounds about right, Josh. Peter, in our internal testing we have seen
> this test failing and increasing timeouts (look at the test code options to
> do with increasing timeout) helped quite some.
> ________________________________________
> From: Josh Elser <josh.elser@gmail.com>
> Sent: Wednesday, June 14, 2017 3:17 PM
> To: dev@hbase.apache.org
> Subject: Re: Problem with IntegrationTestRegionReplicaReplication
>
> On 6/14/17 3:53 AM, Peter Somogyi wrote:
> > Hi,
> >
> > As one of my first task with HBase I started to look into
> > why IntegrationTestRegionReplicaReplication fails. I would like to get
> some
> > suggestions from you.
> >
> > I noticed when I run the test using normal cluster or minicluster I get
> the
> > same error messages: "Error checking data for key [null], no data
> > returned". I looked into the code and here are my conclusions.
> >
> > There are multiple threads writing data parallel which are read by
> multiple
> > reader threads simultaneously. Each writer gets a portion of the keys to
> > write (e.g. 0-2000) and these keys are added to a ConstantDelayQueue.
> > The reader threads get the elements (e.g. key=1000) from the queue and
> > these reader threads assume that all the keys up to this are already in
> the
> > database. Since we're using multiple writers it can happen that another
> > thread has not yet written key=500 and verifying these keys will cause
> the
> > test failure.
> >
> > Do you think my assumption is correct?
>
> Hi Peter,
>
> No, as my memory serves, this is not correct. Readers are not made aware
> of keys to verify until the write occur plus some delay. The delay is
> used to provide enough time for the internal region replication to take
> effect.
>
> So: primary-write, pause, [region replication happens in background],
> add updated key to read queue, reader gets key from queue verifies the
> value on a replica.
>
> The primary should always have seen the new value for a key. If the test
> is showing that a replica does not see the result, it's either a timing
> issue (you need to give a larger delay for HBase to perform the region
> replication) or a bug in the region replication framework itself. That
> said, if you can show that you are seeing what you describe, that sounds
> like the test framework itself is broken :)
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message