hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Latency related configs for 0.90
Date Wed, 20 Apr 2011 16:32:01 GMT
I guess George's case has something to do with pseudo-clustered mode.

On Wed, Apr 20, 2011 at 9:27 AM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:

> Hey George,
>
> Sorry for the late answer, there's nothing that comes to mind when
> reading your email.
>
> HBASE_SLAVE_SLEEP is only used by the bash scripts, like when you do
> hbase-daemons.sh it will wait that sleep time between each machine.
>
> Would you be able to come up with a test that shows the issues you are
> seeing? Like stripping out everything that's related to your stuff and
> leave only the parts that play with hbase?
>
> Have you inspected the logs of those test for anything weird looking
> exceptions? Maybe the logs are screaming about something that needs to
> be taken care of? (just guesses)
>
> Our own experience migrating to 0.90 has been pretty good, we found a
> couple of issues with the new master but not one performance-related
> issue. We ran 0.90.1 for some weeks and now we are on 0.90.2
>
> J-D
>
> On Wed, Apr 20, 2011 at 6:15 AM, George P. Stathis <gstathis@traackr.com>
> wrote:
> > Sorry to bump this, but we could really use a hand here. Right now, we
> have
> > a very hard time seeing repeatable read/write consistency. Any
> suggestions
> > are welcome.
> >
> > -GS
> >
> > On Tue, Apr 19, 2011 at 3:08 PM, George P. Stathis <gstathis@traackr.com
> >wrote:
> >
> >> Hi all,
> >>
> >> In this chapter of our 0.89 to 0.90 migration saga, we are seeing what
> we
> >> suspect might be latency related artifacts.
> >>
> >> The setting:
> >>
> >>    - Our EC2 dev environment running our CI builds
> >>    - CDH3 U0 (both hadoop and hbase) setup in pseudo-clustered mode
> >>
> >> We have several unit tests that have started mysteriously failing in
> random
> >> ways as soon as we migrated our EC2 CI build to the new 0.90 CDH3. Those
> >> tests used to run against 0.89 and never failed before. They also run OK
> on
> >> our local macbooks. On EC2, we are seeing lots of issues where the setup
> >> data is not being persisted in time for the tests to assert against
> them.
> >> They are also not always being torn down properly.
> >>
> >> We first suspected our new code around secondary indexes; we do have
> >> extensive unit tests around it that provide us with a solid level of
> >> confidence that it works properly in our CRUD scenarios. We also
> performance
> >> tested against the old hbase-trx contrib code and our new secondary
> indexes
> >> seem to be running slightly faster as well (of course, that could be due
> to
> >> the bump from 0.89 to 0.90).
> >>
> >> We first started seeing issues running our hudson build on the same
> machine
> >> as the hbase pseudo-cluster. We figured that was putting too much load
> on
> >> the box, so we created a separate large instance on EC2 to host just the
> >> 0.90 stack. This migration nearly quadrupled the number of unit tests
> >> failing at times. The only difference between for first and second CI
> setup
> >> is the network in between.
> >>
> >> Before we start tearing down our code line by line, I'd like to see if
> >> there are latency related configuration tweaks we could try to make the
> >> setup more resilient to network lag. Are there any hbase/zookepper
> settings
> >> that might help? For instance, we see things such as HBASE_SLAVE_SLEEP
> >> in hbase-env.sh . Can that help?
> >>
> >> Any suggestions are more than welcome. Also, the overview above may not
> be
> >> enough to go on, so please let me know if I could provide more details.
> >>
> >> Thank you in advance for any help.
> >>
> >> -GS
> >>
> >>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message