hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Somogyi <psomo...@cloudera.com>
Subject Re: Problem with IntegrationTestRegionReplicaReplication
Date Sun, 18 Jun 2017 15:26:29 GMT
I'm using hbase based on 1.2 version.

On Sat, Jun 17, 2017 at 4:00 PM, Devaraj Das <ddas@hortonworks.com> wrote:

> Peter which version of HBase are tou testing with?
>
>
>
>
> On Thu, Jun 15, 2017 at 11:57 PM -0700, "Peter Somogyi" <
> psomogyi@cloudera.com<mailto:psomogyi@cloudera.com>> wrote:
>
>
> I tried with those parameters but the test still failed.
> I noticed that some of the rows were not replicated to the replicas just
> after I called flush manually. I think memstore replication is not working
> on my system even though it is enabled in the configuration.
> I will look into it today.
>
> On Fri, Jun 16, 2017 at 7:09 AM, Devaraj Das  wrote:
>
> > Peter, do have a look at IntegrationTestRegionReplicaReplication.java ..
> > At the top of the file, the ways to specify the options are documented ..
> > You need to add something like -DIntegrationTestRegionReplicaR
> eplication.read_delay_ms
> > ..
> > ________________________________________
> > From: Josh Elser
> > Sent: Thursday, June 15, 2017 10:40 AM
> > To: dev@hbase.apache.org
> > Subject: Re: Problem with IntegrationTestRegionReplicaReplication
> >
> > I'd start trying a read_delay_ms=60000, region_replication=2,
> > num_keys_per_server=5000, num_regions_per_server=5 with a maybe 10's of
> > reader and writer threads.
> >
> > Again, this can be quite dependent on the kind of hardware you have.
> > You'll definitely have to tweak ;)
> >
> > On 6/15/17 4:44 AM, Peter Somogyi wrote:
> > > Thanks Josh and Devaraj!
> > >
> > > I will try to increase the timeouts. Devaraj, could you share the
> > > parameters you used for this test which worked?
> > >
> > > On Thu, Jun 15, 2017 at 6:44 AM, Devaraj Das
> > wrote:
> > >
> > >> That sounds about right, Josh. Peter, in our internal testing we have
> > seen
> > >> this test failing and increasing timeouts (look at the test code
> > options to
> > >> do with increasing timeout) helped quite some.
> > >> ________________________________________
> > >> From: Josh Elser
> > >> Sent: Wednesday, June 14, 2017 3:17 PM
> > >> To: dev@hbase.apache.org
> > >> Subject: Re: Problem with IntegrationTestRegionReplicaReplication
> > >>
> > >> On 6/14/17 3:53 AM, Peter Somogyi wrote:
> > >>> Hi,
> > >>>
> > >>> As one of my first task with HBase I started to look into
> > >>> why IntegrationTestRegionReplicaReplication fails. I would like to
> get
> > >> some
> > >>> suggestions from you.
> > >>>
> > >>> I noticed when I run the test using normal cluster or minicluster I
> get
> > >> the
> > >>> same error messages: "Error checking data for key [null], no data
> > >>> returned". I looked into the code and here are my conclusions.
> > >>>
> > >>> There are multiple threads writing data parallel which are read by
> > >> multiple
> > >>> reader threads simultaneously. Each writer gets a portion of the keys
> > to
> > >>> write (e.g. 0-2000) and these keys are added to a ConstantDelayQueue.
> > >>> The reader threads get the elements (e.g. key=1000) from the queue
> and
> > >>> these reader threads assume that all the keys up to this are already
> in
> > >> the
> > >>> database. Since we're using multiple writers it can happen that
> another
> > >>> thread has not yet written key=500 and verifying these keys will
> cause
> > >> the
> > >>> test failure.
> > >>>
> > >>> Do you think my assumption is correct?
> > >>
> > >> Hi Peter,
> > >>
> > >> No, as my memory serves, this is not correct. Readers are not made
> aware
> > >> of keys to verify until the write occur plus some delay. The delay is
> > >> used to provide enough time for the internal region replication to
> take
> > >> effect.
> > >>
> > >> So: primary-write, pause, [region replication happens in background],
> > >> add updated key to read queue, reader gets key from queue verifies the
> > >> value on a replica.
> > >>
> > >> The primary should always have seen the new value for a key. If the
> test
> > >> is showing that a replica does not see the result, it's either a
> timing
> > >> issue (you need to give a larger delay for HBase to perform the region
> > >> replication) or a bug in the region replication framework itself. That
> > >> said, if you can show that you are seeing what you describe, that
> sounds
> > >> like the test framework itself is broken :)
> > >>
> > >>
> > >>
> > >>
> > >
> >
> >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message