hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Duxbury (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2731) [hbase] Under load, regions become extremely large and eventually cause region servers to become unresponsive
Date Thu, 31 Jan 2008 21:53:08 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12564518#action_12564518
] 

Bryan Duxbury commented on HADOOP-2731:
---------------------------------------

After about 45% of 1 million 10KB rows imported, the import started to slow down markedly.
I did a little DFS digging to get a sense of the size of mapfiles:

{code}
[rapleaf@tf1 hadoop]$ bin/hadoop dfs -lsr / | grep test_table | grep "mapfiles/[^/]*/data"
| grep -v compaction.dir | awk '{print $4}' | sort -n | awk '{print $1 / 1024 / 1024}'
0
0.589743
21.5422
29.4829
36.4409
36.834
54.6908
56.6071
60.0075
61.7568
64
64.2137
64.2137
64.2137
64.2137
64.2137
64.2137
64.2137
64.2137
64.2137
64.3218
65.3046
68.1251
68.9211
71.2503
73.2158
73.9037
77.5301
82.1786
83.0631
83.1417
88.94
92.9497
98.2762
111.76
112.399
116.162
119.337
127.572
128.496
657.9
760.569
1261.14
1564.22
{code}

(If you can't read awk, that's size in megabytes of each mapfile in the DFS for my test table).

There's only 7 regions, and the biggest is almost 1.5 GiB. I will report again when the job
has completed and the cluster has had a chance to cool down.

> [hbase] Under load, regions become extremely large and eventually cause region servers
to become unresponsive
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2731
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2731
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: Bryan Duxbury
>         Attachments: split-v8.patch, split-v9.patch, split.patch
>
>
> When attempting to write to HBase as fast as possible, HBase accepts puts at a reasonably
high rate for a while, and then the rate begins to drop off, ultimately culminating in exceptions
reaching client code. In my testing, I was able to write about 370 10KB records a second to
HBase until I reach around 1 million rows written. At that point, a moderate to large number
of exceptions - NotServingRegionException, WrongRegionException, region offline, etc - begin
reaching the client code. This appears to be because the retry-and-wait logic in HTable runs
out of retries and fails. 
> Looking at mapfiles for the regions from the command line shows that some of the mapfiles
are between 1 and 2 GB in size, much more than the stated file size limit. Talking with Stack,
one possible explanation for this is that the RegionServer is not choosing to compact files
often enough, leading to many small mapfiles, which in turn leads to a few overlarge mapfiles.
Then, when the time comes to do a split or "major" compaction, it takes an unexpectedly long
time to complete these operations. This translates into errors for the client application.
> If I back off the import process and give the cluster some quiet time, some splits and
compactions clearly do take place, because the number of regions go up and the number of mapfiles/region
goes down. I can then begin writing again in earnest for a short period of time until the
problem begins again.
> Both Marc Harris and myself have seen this behavior.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message