Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (athena.apache.org: domain of yuzhihong@gmail.com
 designates 209.85.217.176 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CALckxSP6UpcMR-OQHhVTL_V3rgPWK9hPF_JBkqfsr6S4Otv+Hg@mail.gmail.com>
References: 
 <CALckxSP6UpcMR-OQHhVTL_V3rgPWK9hPF_JBkqfsr6S4Otv+Hg@mail.gmail.com>
Date: Mon, 8 Jul 2013 06:48:28 -0700
Message-ID: 
 <CALte62wBk=TMEkQ-RNVZE8VbWSRqFbiQkfVxJQ1nYjP4Ld1GJw@mail.gmail.com>
Subject: Re: optimizing block cache requests + eviction
From: Ted Yu <yuzhihong@gmail.com>
To: user@hbase.apache.org
Content-Type: multipart/alternative; boundary=001a11c36be89a78ba04e1004ca6

--001a11c36be89a78ba04e1004ca6
Content-Type: text/plain; charset=ISO-8859-1

For suggestion #3 below, take a look at:

HBASE-7509 Enable RS to query a secondary datanode in parallel, if the
primary takes too long

Cheers

On Mon, Jul 8, 2013 at 3:04 AM, Viral Bajaria <viral.bajaria@gmail.com>wrote:

> Hi,
>
> TL;DR;
> Trying to make a case for making the block eviction strategy smart and to
> not evict remote blocks more frequently and make the requests more smarter.
>
> The question here comes after I debugged the issue that I was having with
> random region servers hitting high load averages. I initially thought the
> problem was hardware related i.e. bad disk or network since the wait I/O
> was too high but it was a combination of things.
>
> I figured with SCR (short circuit read) ON the datanode should almost never
> show high amount of block requests from the local regionservers. So my
> starting point for debugging was the datanode since it was doing a ton of
> I/O. The clienttrace logs helped me figure out which RS nodes were making
> block requests. I hacked up a script to report which blocks are being
> requested and how many times per minute. I found that some blocks were
> being requested 10+ times in a minute and over 2000 times in an hour from
> the same regionserver. This was causing the server to do 40+MB/s on reads
> alone. That was on the higher side, the average was closer to 100 or so per
> hour.
>
> Now why did I end up in such a situation. It happened due to the fact that
> I added servers to the cluster and rebalanced the cluster. At the same time
> I added some drives and also removed the offending server in my setup. This
> caused some of the data to not be co-located with the regionservers. Given
> that major_compaction was disabled and it would not have run for a while
> (atleast on some tables) these block requests would not go away. One of my
> regionservers was totally overwhelmed. I made the situation worse when I
> removed the server that was under heavy load with the assumption that it's
> a hardware problem with the box without doing a deep dive (doh!). Given
> that regionservers will be added in the future, I expect block locality to
> go down till major_compaction runs. Also nodes can go down and cause this
> problem. So I started thinking of probable solutions, but first some
> observations.
>
> *Observations/Comments*
> - The surprising part was the regionservers were trying to make so many
> requests for the same block in the same minute (let alone hour). Could this
> happen because the original request took a few seconds and so the
> regionserver re-requested ? I didn't see any fetch errors in the
> regionserver logs for blocks.
> - Even more strange; my heap size was at 11G and the time when this was
> happening, the used heap was at 2-4G. I would have expected the heap to
> grow higher than that since the blockCache should be using atleast 40% of
> the available heap space.
> - Another strange thing that I observed was, the block was being requested
> from the same datanode every single time.
>
> *Possible Solution/Changes*
> - Would it make sense to give remote blocks higher priority over the local
> blocks that can be read via SCR and not let them get evicted if there is a
> tie in which block to evict ?
> - Should we throttle the number of outgoing requests for a block ? I am not
> sure if my firewall caused some issue but I wouldn't expect multiple block
> fetch requests in the same minute. I did see a few RST packets getting
> dropped at the firewall but I wasn't able to trace the problem was due to
> this.
> - We have 3 replicas available, shouldn't we request from the other
> datanode if one might take a lot of time ? The amount of time it took to
> read a block went up when the box was under heavy load, yet the re-requests
> were going to the same one. Is this something that is available on the
> DFSClient and can we exploit it ?
> - Is it possible to migrate a region to a server which has higher number of
> blocks available for it ? We don't need to make this automatic, but we
> could provide a command that could be invoked manually to assign a region
> to a specific regionserver. Thoughts ?
>
> Thanks,
> Viral
>

--001a11c36be89a78ba04e1004ca6--