hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Duxbury (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2731) [hbase] Under extreme load, regions become extremely large and eventually cause region servers to become unresponsive
Date Wed, 30 Jan 2008 02:57:34 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12563811#action_12563811
] 

Bryan Duxbury commented on HADOOP-2731:
---------------------------------------

I've been taking a look at the RegionServer side of things to try and understand why a split
wouldn't occur. Some code:

{code}
if (e.getRegion().compactIfNeeded()) {
  splitter.splitRequested(e);
}
{code}

We only queue a region to be split if it's just been compacted. I assume the rationale here
is that unless a compaction occurred, there'd be no reason to split in the first place. I'm
not convinced that's true, however. A store will only compact if it has more mapfiles than
the compaction threshold, which in the case of some of my regions, wasn't the case - the individual
mapfiles were 1.5GiB, but there were only 2. As a result, compaction and thus splitting was
skipped. Shouldn't we be testing to see if the overall size of the mapfiles make splitting
necessary, rather than letting the compaction determine whether we do anything?

Perhaps we should add an optional compaction. Instead of testing HStore.needsCompaction, which
only checks if it is above the compaction threshold, maybe we should also have a isCompactable,
which just checks if there is more than one mapfile. The optional compacts could happen behind
mandatory, threshold-based compacts. Then, we could always put an HStore on the compact queue
whenever there is an event that would cause a change to the number of mapfiles, with the constraint
that if the store is already on the compact queue, we don't re-add it.

If we did all of that, then it would probably put us in the right state to keep the split
thread doing exactly what it is doing right now, but splits will also happen in downtime.

> [hbase] Under extreme load, regions become extremely large and eventually cause region
servers to become unresponsive
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2731
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2731
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: Bryan Duxbury
>
> When attempting to write to HBase as fast as possible, HBase accepts puts at a reasonably
high rate for a while, and then the rate begins to drop off, ultimately culminating in exceptions
reaching client code. In my testing, I was able to write about 370 10KB records a second to
HBase until I reach around 1 million rows written. At that point, a moderate to large number
of exceptions - NotServingRegionException, WrongRegionException, region offline, etc - begin
reaching the client code. This appears to be because the retry-and-wait logic in HTable runs
out of retries and fails. 
> Looking at mapfiles for the regions from the command line shows that some of the mapfiles
are between 1 and 2 GB in size, much more than the stated file size limit. Talking with Stack,
one possible explanation for this is that the RegionServer is not choosing to compact files
often enough, leading to many small mapfiles, which in turn leads to a few overlarge mapfiles.
Then, when the time comes to do a split or "major" compaction, it takes an unexpectedly long
time to complete these operations. This translates into errors for the client application.
> If I back off the import process and give the cluster some quiet time, some splits and
compactions clearly do take place, because the number of regions go up and the number of mapfiles/region
goes down. I can then begin writing again in earnest for a short period of time until the
problem begins again.
> Both Marc Harris and myself have seen this behavior.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message