Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 68768200D37 for ; Thu, 9 Nov 2017 16:16:43 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 6710F160BEF; Thu, 9 Nov 2017 15:16:43 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3AADC1609E5 for ; Thu, 9 Nov 2017 16:16:41 +0100 (CET) Received: (qmail 49442 invoked by uid 500); 9 Nov 2017 15:16:33 -0000 Mailing-List: contact commits-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list commits@hbase.apache.org Received: (qmail 48117 invoked by uid 99); 9 Nov 2017 15:16:32 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Nov 2017 15:16:32 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 0009AF5E03; Thu, 9 Nov 2017 15:16:30 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: git-site-role@apache.org To: commits@hbase.apache.org Date: Thu, 09 Nov 2017 15:17:10 -0000 Message-Id: <39887ef9b7b84e4f935e2800afb3dbec@git.apache.org> In-Reply-To: <293710d66d294e25b3c51011fbb6c268@git.apache.org> References: <293710d66d294e25b3c51011fbb6c268@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [42/51] [partial] hbase-site git commit: Published site at . archived-at: Thu, 09 Nov 2017 15:16:43 -0000 http://git-wip-us.apache.org/repos/asf/hbase-site/blob/2b3f2bee/apidocs/src-html/org/apache/hadoop/hbase/exceptions/RegionInRecoveryException.html ---------------------------------------------------------------------- diff --git a/apidocs/src-html/org/apache/hadoop/hbase/exceptions/RegionInRecoveryException.html b/apidocs/src-html/org/apache/hadoop/hbase/exceptions/RegionInRecoveryException.html deleted file mode 100644 index e324a9e..0000000 --- a/apidocs/src-html/org/apache/hadoop/hbase/exceptions/RegionInRecoveryException.html +++ /dev/null @@ -1,116 +0,0 @@ - - - -Source code - - - -
-
001/**
-002 *
-003 * Licensed to the Apache Software Foundation (ASF) under one
-004 * or more contributor license agreements.  See the NOTICE file
-005 * distributed with this work for additional information
-006 * regarding copyright ownership.  The ASF licenses this file
-007 * to you under the Apache License, Version 2.0 (the
-008 * "License"); you may not use this file except in compliance
-009 * with the License.  You may obtain a copy of the License at
-010 *
-011 *     http://www.apache.org/licenses/LICENSE-2.0
-012 *
-013 * Unless required by applicable law or agreed to in writing, software
-014 * distributed under the License is distributed on an "AS IS" BASIS,
-015 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-016 * See the License for the specific language governing permissions and
-017 * limitations under the License.
-018 */
-019package org.apache.hadoop.hbase.exceptions;
-020
-021import org.apache.hadoop.hbase.NotServingRegionException;
-022import org.apache.yetus.audience.InterfaceAudience;
-023
-024/**
-025 * Thrown when a read request issued against a region which is in recovering state.
-026 */
-027@InterfaceAudience.Public
-028public class RegionInRecoveryException extends NotServingRegionException {
-029  private static final long serialVersionUID = 327302071153799L;
-030
-031  /** default constructor */
-032  public RegionInRecoveryException() {
-033    super();
-034  }
-035
-036  /**
-037   * Constructor
-038   * @param s message
-039   */
-040  public RegionInRecoveryException(String s) {
-041    super(s);
-042  }
-043
-044}
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- - http://git-wip-us.apache.org/repos/asf/hbase-site/blob/2b3f2bee/apidocs/src-html/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.html ---------------------------------------------------------------------- diff --git a/apidocs/src-html/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.html b/apidocs/src-html/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.html index 898d59a..be26fad 100644 --- a/apidocs/src-html/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.html +++ b/apidocs/src-html/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.html @@ -276,391 +276,392 @@ 268 } 269 270 //The default value of "hbase.mapreduce.input.autobalance" is false. -271 if (context.getConfiguration().getBoolean(MAPREDUCE_INPUT_AUTOBALANCE, false) != false) { -272 long maxAveRegionSize = context.getConfiguration().getInt(MAX_AVERAGE_REGION_SIZE, 8*1073741824); -273 return calculateAutoBalancedSplits(splits, maxAveRegionSize); -274 } -275 -276 // return one mapper per region -277 return splits; -278 } finally { -279 if (closeOnFinish) { -280 closeTable(); -281 } -282 } -283 } -284 -285 /** -286 * Create one InputSplit per region -287 * -288 * @return The list of InputSplit for all the regions -289 * @throws IOException -290 */ -291 private List<InputSplit> oneInputSplitPerRegion() throws IOException { -292 RegionSizeCalculator sizeCalculator = -293 new RegionSizeCalculator(getRegionLocator(), getAdmin()); -294 -295 TableName tableName = getTable().getName(); -296 -297 Pair<byte[][], byte[][]> keys = getStartEndKeys(); -298 if (keys == null || keys.getFirst() == null || -299 keys.getFirst().length == 0) { -300 HRegionLocation regLoc = -301 getRegionLocator().getRegionLocation(HConstants.EMPTY_BYTE_ARRAY, false); -302 if (null == regLoc) { -303 throw new IOException("Expecting at least one region."); -304 } -305 List<InputSplit> splits = new ArrayList<>(1); -306 long regionSize = sizeCalculator.getRegionSize(regLoc.getRegionInfo().getRegionName()); -307 TableSplit split = new TableSplit(tableName, scan, -308 HConstants.EMPTY_BYTE_ARRAY, HConstants.EMPTY_BYTE_ARRAY, regLoc -309 .getHostnamePort().split(Addressing.HOSTNAME_PORT_SEPARATOR)[0], regionSize); -310 splits.add(split); -311 return splits; -312 } -313 List<InputSplit> splits = new ArrayList<>(keys.getFirst().length); -314 for (int i = 0; i < keys.getFirst().length; i++) { -315 if (!includeRegionInSplit(keys.getFirst()[i], keys.getSecond()[i])) { -316 continue; -317 } -318 -319 byte[] startRow = scan.getStartRow(); -320 byte[] stopRow = scan.getStopRow(); -321 // determine if the given start an stop key fall into the region -322 if ((startRow.length == 0 || keys.getSecond()[i].length == 0 || -323 Bytes.compareTo(startRow, keys.getSecond()[i]) < 0) && -324 (stopRow.length == 0 || -325 Bytes.compareTo(stopRow, keys.getFirst()[i]) > 0)) { -326 byte[] splitStart = startRow.length == 0 || -327 Bytes.compareTo(keys.getFirst()[i], startRow) >= 0 ? -328 keys.getFirst()[i] : startRow; -329 byte[] splitStop = (stopRow.length == 0 || -330 Bytes.compareTo(keys.getSecond()[i], stopRow) <= 0) && -331 keys.getSecond()[i].length > 0 ? -332 keys.getSecond()[i] : stopRow; -333 -334 HRegionLocation location = getRegionLocator().getRegionLocation(keys.getFirst()[i], false); -335 // The below InetSocketAddress creation does a name resolution. -336 InetSocketAddress isa = new InetSocketAddress(location.getHostname(), location.getPort()); -337 if (isa.isUnresolved()) { -338 LOG.warn("Failed resolve " + isa); -339 } -340 InetAddress regionAddress = isa.getAddress(); -341 String regionLocation; -342 regionLocation = reverseDNS(regionAddress); -343 -344 byte[] regionName = location.getRegionInfo().getRegionName(); -345 String encodedRegionName = location.getRegionInfo().getEncodedName(); -346 long regionSize = sizeCalculator.getRegionSize(regionName); -347 TableSplit split = new TableSplit(tableName, scan, -348 splitStart, splitStop, regionLocation, encodedRegionName, regionSize); -349 splits.add(split); -350 if (LOG.isDebugEnabled()) { -351 LOG.debug("getSplits: split -> " + i + " -> " + split); -352 } -353 } -354 } -355 return splits; -356 } -357 -358 /** -359 * Create n splits for one InputSplit, For now only support uniform distribution -360 * @param split A TableSplit corresponding to a range of rowkeys -361 * @param n Number of ranges after splitting. Pass 1 means no split for the range -362 * Pass 2 if you want to split the range in two; -363 * @return A list of TableSplit, the size of the list is n -364 * @throws IllegalArgumentIOException -365 */ -366 protected List<InputSplit> createNInputSplitsUniform(InputSplit split, int n) -367 throws IllegalArgumentIOException { -368 if (split == null || !(split instanceof TableSplit)) { -369 throw new IllegalArgumentIOException( -370 "InputSplit for CreateNSplitsPerRegion can not be null + " -371 + "and should be instance of TableSplit"); -372 } -373 //if n < 1, then still continue using n = 1 -374 n = n < 1 ? 1 : n; -375 List<InputSplit> res = new ArrayList<>(n); -376 if (n == 1) { -377 res.add(split); -378 return res; -379 } -380 -381 // Collect Region related information -382 TableSplit ts = (TableSplit) split; -383 TableName tableName = ts.getTable(); -384 String regionLocation = ts.getRegionLocation(); -385 String encodedRegionName = ts.getEncodedRegionName(); -386 long regionSize = ts.getLength(); -387 byte[] startRow = ts.getStartRow(); -388 byte[] endRow = ts.getEndRow(); -389 -390 // For special case: startRow or endRow is empty -391 if (startRow.length == 0 && endRow.length == 0){ -392 startRow = new byte[1]; -393 endRow = new byte[1]; -394 startRow[0] = 0; -395 endRow[0] = -1; -396 } -397 if (startRow.length == 0 && endRow.length != 0){ -398 startRow = new byte[1]; -399 startRow[0] = 0; -400 } -401 if (startRow.length != 0 && endRow.length == 0){ -402 endRow =new byte[startRow.length]; -403 for (int k = 0; k < startRow.length; k++){ -404 endRow[k] = -1; -405 } -406 } -407 -408 // Split Region into n chunks evenly -409 byte[][] splitKeys = Bytes.split(startRow, endRow, true, n-1); -410 for (int i = 0; i < splitKeys.length - 1; i++) { -411 //notice that the regionSize parameter may be not very accurate -412 TableSplit tsplit = -413 new TableSplit(tableName, scan, splitKeys[i], splitKeys[i + 1], regionLocation, -414 encodedRegionName, regionSize / n); -415 res.add(tsplit); -416 } -417 return res; -418 } -419 /** -420 * Calculates the number of MapReduce input splits for the map tasks. The number of -421 * MapReduce input splits depends on the average region size. -422 * Make it 'public' for testing -423 * -424 * @param splits The list of input splits before balance. -425 * @param maxAverageRegionSize max Average region size for one mapper -426 * @return The list of input splits. -427 * @throws IOException When creating the list of splits fails. -428 * @see org.apache.hadoop.mapreduce.InputFormat#getSplits( -429 *org.apache.hadoop.mapreduce.JobContext) -430 */ -431 public List<InputSplit> calculateAutoBalancedSplits(List<InputSplit> splits, long maxAverageRegionSize) -432 throws IOException { -433 if (splits.size() == 0) { -434 return splits; -435 } -436 List<InputSplit> resultList = new ArrayList<>(); -437 long totalRegionSize = 0; -438 for (int i = 0; i < splits.size(); i++) { -439 TableSplit ts = (TableSplit) splits.get(i); -440 totalRegionSize += ts.getLength(); -441 } -442 long averageRegionSize = totalRegionSize / splits.size(); -443 // totalRegionSize might be overflow, and the averageRegionSize must be positive. -444 if (averageRegionSize <= 0) { -445 LOG.warn("The averageRegionSize is not positive: " + averageRegionSize + ", " + -446 "set it to Long.MAX_VALUE " + splits.size()); -447 averageRegionSize = Long.MAX_VALUE / splits.size(); -448 } -449 //if averageRegionSize is too big, change it to default as 1 GB, -450 if (averageRegionSize > maxAverageRegionSize) { -451 averageRegionSize = maxAverageRegionSize; -452 } -453 // if averageRegionSize is too small, we do not need to allocate more mappers for those 'large' region -454 // set default as 16M = (default hdfs block size) / 4; -455 if (averageRegionSize < 16 * 1048576) { -456 return splits; -457 } -458 for (int i = 0; i < splits.size(); i++) { -459 TableSplit ts = (TableSplit) splits.get(i); -460 TableName tableName = ts.getTable(); -461 String regionLocation = ts.getRegionLocation(); -462 String encodedRegionName = ts.getEncodedRegionName(); -463 long regionSize = ts.getLength(); -464 -465 if (regionSize >= averageRegionSize) { -466 // make this region as multiple MapReduce input split. -467 int n = (int) Math.round(Math.log(((double) regionSize) / ((double) averageRegionSize)) + 1.0); -468 List<InputSplit> temp = createNInputSplitsUniform(ts, n); -469 resultList.addAll(temp); -470 } else { -471 // if the total size of several small continuous regions less than the average region size, -472 // combine them into one MapReduce input split. -473 long totalSize = regionSize; -474 byte[] splitStartKey = ts.getStartRow(); -475 byte[] splitEndKey = ts.getEndRow(); -476 int j = i + 1; -477 while (j < splits.size()) { -478 TableSplit nextRegion = (TableSplit) splits.get(j); -479 long nextRegionSize = nextRegion.getLength(); -480 if (totalSize + nextRegionSize <= averageRegionSize) { -481 totalSize = totalSize + nextRegionSize; -482 splitEndKey = nextRegion.getEndRow(); -483 j++; -484 } else { -485 break; -486 } -487 } -488 i = j - 1; -489 TableSplit t = new TableSplit(tableName, scan, splitStartKey, splitEndKey, regionLocation, -490 encodedRegionName, totalSize); -491 resultList.add(t); -492 } -493 } -494 return resultList; -495 } -496 -497 String reverseDNS(InetAddress ipAddress) throws UnknownHostException { -498 String hostName = this.reverseDNSCacheMap.get(ipAddress); -499 if (hostName == null) { -500 String ipAddressString = null; -501 try { -502 ipAddressString = DNS.reverseDns(ipAddress, null); -503 } catch (Exception e) { -504 // We can use InetAddress in case the jndi failed to pull up the reverse DNS entry from the -505 // name service. Also, in case of ipv6, we need to use the InetAddress since resolving -506 // reverse DNS using jndi doesn't work well with ipv6 addresses. -507 ipAddressString = InetAddress.getByName(ipAddress.getHostAddress()).getHostName(); -508 } -509 if (ipAddressString == null) throw new UnknownHostException("No host found for " + ipAddress); -510 hostName = Strings.domainNamePointerToHostName(ipAddressString); -511 this.reverseDNSCacheMap.put(ipAddress, hostName); -512 } -513 return hostName; -514 } -515 -516 /** -517 * Test if the given region is to be included in the InputSplit while splitting -518 * the regions of a table. -519 * <p> -520 * This optimization is effective when there is a specific reasoning to exclude an entire region from the M-R job, -521 * (and hence, not contributing to the InputSplit), given the start and end keys of the same. <br> -522 * Useful when we need to remember the last-processed top record and revisit the [last, current) interval for M-R processing, -523 * continuously. In addition to reducing InputSplits, reduces the load on the region server as well, due to the ordering of the keys. -524 * <br> +271 if (context.getConfiguration().getBoolean(MAPREDUCE_INPUT_AUTOBALANCE, false)) { +272 long maxAveRegionSize = context.getConfiguration() +273 .getLong(MAX_AVERAGE_REGION_SIZE, 8L*1073741824); //8GB +274 return calculateAutoBalancedSplits(splits, maxAveRegionSize); +275 } +276 +277 // return one mapper per region +278 return splits; +279 } finally { +280 if (closeOnFinish) { +281 closeTable(); +282 } +283 } +284 } +285 +286 /** +287 * Create one InputSplit per region +288 * +289 * @return The list of InputSplit for all the regions +290 * @throws IOException +291 */ +292 private List<InputSplit> oneInputSplitPerRegion() throws IOException { +293 RegionSizeCalculator sizeCalculator = +294 new RegionSizeCalculator(getRegionLocator(), getAdmin()); +295 +296 TableName tableName = getTable().getName(); +297 +298 Pair<byte[][], byte[][]> keys = getStartEndKeys(); +299 if (keys == null || keys.getFirst() == null || +300 keys.getFirst().length == 0) { +301 HRegionLocation regLoc = +302 getRegionLocator().getRegionLocation(HConstants.EMPTY_BYTE_ARRAY, false); +303 if (null == regLoc) { +304 throw new IOException("Expecting at least one region."); +305 } +306 List<InputSplit> splits = new ArrayList<>(1); +307 long regionSize = sizeCalculator.getRegionSize(regLoc.getRegionInfo().getRegionName()); +308 TableSplit split = new TableSplit(tableName, scan, +309 HConstants.EMPTY_BYTE_ARRAY, HConstants.EMPTY_BYTE_ARRAY, regLoc +310 .getHostnamePort().split(Addressing.HOSTNAME_PORT_SEPARATOR)[0], regionSize); +311 splits.add(split); +312 return splits; +313 } +314 List<InputSplit> splits = new ArrayList<>(keys.getFirst().length); +315 for (int i = 0; i < keys.getFirst().length; i++) { +316 if (!includeRegionInSplit(keys.getFirst()[i], keys.getSecond()[i])) { +317 continue; +318 } +319 +320 byte[] startRow = scan.getStartRow(); +321 byte[] stopRow = scan.getStopRow(); +322 // determine if the given start an stop key fall into the region +323 if ((startRow.length == 0 || keys.getSecond()[i].length == 0 || +324 Bytes.compareTo(startRow, keys.getSecond()[i]) < 0) && +325 (stopRow.length == 0 || +326 Bytes.compareTo(stopRow, keys.getFirst()[i]) > 0)) { +327 byte[] splitStart = startRow.length == 0 || +328 Bytes.compareTo(keys.getFirst()[i], startRow) >= 0 ? +329 keys.getFirst()[i] : startRow; +330 byte[] splitStop = (stopRow.length == 0 || +331 Bytes.compareTo(keys.getSecond()[i], stopRow) <= 0) && +332 keys.getSecond()[i].length > 0 ? +333 keys.getSecond()[i] : stopRow; +334 +335 HRegionLocation location = getRegionLocator().getRegionLocation(keys.getFirst()[i], false); +336 // The below InetSocketAddress creation does a name resolution. +337 InetSocketAddress isa = new InetSocketAddress(location.getHostname(), location.getPort()); +338 if (isa.isUnresolved()) { +339 LOG.warn("Failed resolve " + isa); +340 } +341 InetAddress regionAddress = isa.getAddress(); +342 String regionLocation; +343 regionLocation = reverseDNS(regionAddress); +344 +345 byte[] regionName = location.getRegionInfo().getRegionName(); +346 String encodedRegionName = location.getRegionInfo().getEncodedName(); +347 long regionSize = sizeCalculator.getRegionSize(regionName); +348 TableSplit split = new TableSplit(tableName, scan, +349 splitStart, splitStop, regionLocation, encodedRegionName, regionSize); +350 splits.add(split); +351 if (LOG.isDebugEnabled()) { +352 LOG.debug("getSplits: split -> " + i + " -> " + split); +353 } +354 } +355 } +356 return splits; +357 } +358 +359 /** +360 * Create n splits for one InputSplit, For now only support uniform distribution +361 * @param split A TableSplit corresponding to a range of rowkeys +362 * @param n Number of ranges after splitting. Pass 1 means no split for the range +363 * Pass 2 if you want to split the range in two; +364 * @return A list of TableSplit, the size of the list is n +365 * @throws IllegalArgumentIOException +366 */ +367 protected List<InputSplit> createNInputSplitsUniform(InputSplit split, int n) +368 throws IllegalArgumentIOException { +369 if (split == null || !(split instanceof TableSplit)) { +370 throw new IllegalArgumentIOException( +371 "InputSplit for CreateNSplitsPerRegion can not be null + " +372 + "and should be instance of TableSplit"); +373 } +374 //if n < 1, then still continue using n = 1 +375 n = n < 1 ? 1 : n; +376 List<InputSplit> res = new ArrayList<>(n); +377 if (n == 1) { +378 res.add(split); +379 return res; +380 } +381 +382 // Collect Region related information +383 TableSplit ts = (TableSplit) split; +384 TableName tableName = ts.getTable(); +385 String regionLocation = ts.getRegionLocation(); +386 String encodedRegionName = ts.getEncodedRegionName(); +387 long regionSize = ts.getLength(); +388 byte[] startRow = ts.getStartRow(); +389 byte[] endRow = ts.getEndRow(); +390 +391 // For special case: startRow or endRow is empty +392 if (startRow.length == 0 && endRow.length == 0){ +393 startRow = new byte[1]; +394 endRow = new byte[1]; +395 startRow[0] = 0; +396 endRow[0] = -1; +397 } +398 if (startRow.length == 0 && endRow.length != 0){ +399 startRow = new byte[1]; +400 startRow[0] = 0; +401 } +402 if (startRow.length != 0 && endRow.length == 0){ +403 endRow =new byte[startRow.length]; +404 for (int k = 0; k < startRow.length; k++){ +405 endRow[k] = -1; +406 } +407 } +408 +409 // Split Region into n chunks evenly +410 byte[][] splitKeys = Bytes.split(startRow, endRow, true, n-1); +411 for (int i = 0; i < splitKeys.length - 1; i++) { +412 //notice that the regionSize parameter may be not very accurate +413 TableSplit tsplit = +414 new TableSplit(tableName, scan, splitKeys[i], splitKeys[i + 1], regionLocation, +415 encodedRegionName, regionSize / n); +416 res.add(tsplit); +417 } +418 return res; +419 } +420 /** +421 * Calculates the number of MapReduce input splits for the map tasks. The number of +422 * MapReduce input splits depends on the average region size. +423 * Make it 'public' for testing +424 * +425 * @param splits The list of input splits before balance. +426 * @param maxAverageRegionSize max Average region size for one mapper +427 * @return The list of input splits. +428 * @throws IOException When creating the list of splits fails. +429 * @see org.apache.hadoop.mapreduce.InputFormat#getSplits( +430 *org.apache.hadoop.mapreduce.JobContext) +431 */ +432 public List<InputSplit> calculateAutoBalancedSplits(List<InputSplit> splits, long maxAverageRegionSize) +433 throws IOException { +434 if (splits.size() == 0) { +435 return splits; +436 } +437 List<InputSplit> resultList = new ArrayList<>(); +438 long totalRegionSize = 0; +439 for (int i = 0; i < splits.size(); i++) { +440 TableSplit ts = (TableSplit) splits.get(i); +441 totalRegionSize += ts.getLength(); +442 } +443 long averageRegionSize = totalRegionSize / splits.size(); +444 // totalRegionSize might be overflow, and the averageRegionSize must be positive. +445 if (averageRegionSize <= 0) { +446 LOG.warn("The averageRegionSize is not positive: " + averageRegionSize + ", " + +447 "set it to Long.MAX_VALUE " + splits.size()); +448 averageRegionSize = Long.MAX_VALUE / splits.size(); +449 } +450 //if averageRegionSize is too big, change it to default as 1 GB, +451 if (averageRegionSize > maxAverageRegionSize) { +452 averageRegionSize = maxAverageRegionSize; +453 } +454 // if averageRegionSize is too small, we do not need to allocate more mappers for those 'large' region +455 // set default as 16M = (default hdfs block size) / 4; +456 if (averageRegionSize < 16 * 1048576) { +457 return splits; +458 } +459 for (int i = 0; i < splits.size(); i++) { +460 TableSplit ts = (TableSplit) splits.get(i); +461 TableName tableName = ts.getTable(); +462 String regionLocation = ts.getRegionLocation(); +463 String encodedRegionName = ts.getEncodedRegionName(); +464 long regionSize = ts.getLength(); +465 +466 if (regionSize >= averageRegionSize) { +467 // make this region as multiple MapReduce input split. +468 int n = (int) Math.round(Math.log(((double) regionSize) / ((double) averageRegionSize)) + 1.0); +469 List<InputSplit> temp = createNInputSplitsUniform(ts, n); +470 resultList.addAll(temp); +471 } else { +472 // if the total size of several small continuous regions less than the average region size, +473 // combine them into one MapReduce input split. +474 long totalSize = regionSize; +475 byte[] splitStartKey = ts.getStartRow(); +476 byte[] splitEndKey = ts.getEndRow(); +477 int j = i + 1; +478 while (j < splits.size()) { +479 TableSplit nextRegion = (TableSplit) splits.get(j); +480 long nextRegionSize = nextRegion.getLength(); +481 if (totalSize + nextRegionSize <= averageRegionSize) { +482 totalSize = totalSize + nextRegionSize; +483 splitEndKey = nextRegion.getEndRow(); +484 j++; +485 } else { +486 break; +487 } +488 } +489 i = j - 1; +490 TableSplit t = new TableSplit(tableName, scan, splitStartKey, splitEndKey, regionLocation, +491 encodedRegionName, totalSize); +492 resultList.add(t); +493 } +494 } +495 return resultList; +496 } +497 +498 String reverseDNS(InetAddress ipAddress) throws UnknownHostException { +499 String hostName = this.reverseDNSCacheMap.get(ipAddress); +500 if (hostName == null) { +501 String ipAddressString = null; +502 try { +503 ipAddressString = DNS.reverseDns(ipAddress, null); +504 } catch (Exception e) { +505 // We can use InetAddress in case the jndi failed to pull up the reverse DNS entry from the +506 // name service. Also, in case of ipv6, we need to use the InetAddress since resolving +507 // reverse DNS using jndi doesn't work well with ipv6 addresses. +508 ipAddressString = InetAddress.getByName(ipAddress.getHostAddress()).getHostName(); +509 } +510 if (ipAddressString == null) throw new UnknownHostException("No host found for " + ipAddress); +511 hostName = Strings.domainNamePointerToHostName(ipAddressString); +512 this.reverseDNSCacheMap.put(ipAddress, hostName); +513 } +514 return hostName; +515 } +516 +517 /** +518 * Test if the given region is to be included in the InputSplit while splitting +519 * the regions of a table. +520 * <p> +521 * This optimization is effective when there is a specific reasoning to exclude an entire region from the M-R job, +522 * (and hence, not contributing to the InputSplit), given the start and end keys of the same. <br> +523 * Useful when we need to remember the last-processed top record and revisit the [last, current) interval for M-R processing, +524 * continuously. In addition to reducing InputSplits, reduces the load on the region server as well, due to the ordering of the keys. 525 * <br> -526 * Note: It is possible that <code>endKey.length() == 0 </code> , for the last (recent) region. -527 * <br> -528 * Override this method, if you want to bulk exclude regions altogether from M-R. By default, no region is excluded( i.e. all regions are included). -529 * +526 * <br> +527 * Note: It is possible that <code>endKey.length() == 0 </code> , for the last (recent) region. +528 * <br> +529 * Override this method, if you want to bulk exclude regions altogether from M-R. By default, no region is excluded( i.e. all regions are included). 530 * -531 * @param startKey Start key of the region -532 * @param endKey End key of the region -533 * @return true, if this region needs to be included as part of the input (default). -534 * -535 */ -536 protected boolean includeRegionInSplit(final byte[] startKey, final byte [] endKey) { -537 return true; -538 } -539 -540 /** -541 * Allows subclasses to get the {@link RegionLocator}. -542 */ -543 protected RegionLocator getRegionLocator() { -544 if (regionLocator == null) { -545 throw new IllegalStateException(NOT_INITIALIZED); -546 } -547 return regionLocator; -548 } -549 -550 /** -551 * Allows subclasses to get the {@link Table}. -552 */ -553 protected Table getTable() { -554 if (table == null) { -555 throw new IllegalStateException(NOT_INITIALIZED); -556 } -557 return table; -558 } -559 -560 /** -561 * Allows subclasses to get the {@link Admin}. -562 */ -563 protected Admin getAdmin() { -564 if (admin == null) { -565 throw new IllegalStateException(NOT_INITIALIZED); -566 } -567 return admin; -568 } -569 -570 /** -571 * Allows subclasses to initialize the table information. -572 * -573 * @param connection The Connection to the HBase cluster. MUST be unmanaged. We will close. -574 * @param tableName The {@link TableName} of the table to process. -575 * @throws IOException -576 */ -577 protected void initializeTable(Connection connection, TableName tableName) throws IOException { -578 if (this.table != null || this.connection != null) { -579 LOG.warn("initializeTable called multiple times. Overwriting connection and table " + -580 "reference; TableInputFormatBase will not close these old references when done."); -581 } -582 this.table = connection.getTable(tableName); -583 this.regionLocator = connection.getRegionLocator(tableName); -584 this.admin = connection.getAdmin(); -585 this.connection = connection; -586 } -587 -588 /** -589 * Gets the scan defining the actual details like columns etc. -590 * -591 * @return The internal scan instance. -592 */ -593 public Scan getScan() { -594 if (this.scan == null) this.scan = new Scan(); -595 return scan; -596 } -597 -598 /** -599 * Sets the scan defining the actual details like columns etc. -600 * -601 * @param scan The scan to set. -602 */ -603 public void setScan(Scan scan) { -604 this.scan = scan; -605 } -606 -607 /** -608 * Allows subclasses to set the {@link TableRecordReader}. -609 * -610 * @param tableRecordReader A different {@link TableRecordReader} -611 * implementation. -612 */ -613 protected void setTableRecordReader(TableRecordReader tableRecordReader) { -614 this.tableRecordReader = tableRecordReader; -615 } -616 -617 /** -618 * Handle subclass specific set up. -619 * Each of the entry points used by the MapReduce framework, -620 * {@link #createRecordReader(InputSplit, TaskAttemptContext)} and {@link #getSplits(JobContext)}, -621 * will call {@link #initialize(JobContext)} as a convenient centralized location to handle -622 * retrieving the necessary configuration information and calling -623 * {@link #initializeTable(Connection, TableName)}. -624 * -625 * Subclasses should implement their initialize call such that it is safe to call multiple times. -626 * The current TableInputFormatBase implementation relies on a non-null table reference to decide -627 * if an initialize call is needed, but this behavior may change in the future. In particular, -628 * it is critical that initializeTable not be called multiple times since this will leak -629 * Connection instances. -630 * -631 */ -632 protected void initialize(JobContext context) throws IOException { -633 } -634 -635 /** -636 * Close the Table and related objects that were initialized via -637 * {@link #initializeTable(Connection, TableName)}. -638 * -639 * @throws IOException -640 */ -641 protected void closeTable() throws IOException { -642 close(admin, table, regionLocator, connection); -643 admin = null; -644 table = null; -645 regionLocator = null; -646 connection = null; -647 } -648 -649 private void close(Closeable... closables) throws IOException { -650 for (Closeable c : closables) { -651 if(c != null) { c.close(); } -652 } -653 } -654 -655} +531 * +532 * @param startKey Start key of the region +533 * @param endKey End key of the region +534 * @return true, if this region needs to be included as part of the input (default). +535 * +536 */ +537 protected boolean includeRegionInSplit(final byte[] startKey, final byte [] endKey) { +538 return true; +539 } +540 +541 /** +542 * Allows subclasses to get the {@link RegionLocator}. +543 */ +544 protected RegionLocator getRegionLocator() { +545 if (regionLocator == null) { +546 throw new IllegalStateException(NOT_INITIALIZED); +547 } +548 return regionLocator; +549 } +550 +551 /** +552 * Allows subclasses to get the {@link Table}. +553 */ +554 protected Table getTable() { +555 if (table == null) { +556 throw new IllegalStateException(NOT_INITIALIZED); +557 } +558 return table; +559 } +560 +561 /** +562 * Allows subclasses to get the {@link Admin}. +563 */ +564 protected Admin getAdmin() { +565 if (admin == null) { +566 throw new IllegalStateException(NOT_INITIALIZED); +567 } +568 return admin; +569 } +570 +571 /** +572 * Allows subclasses to initialize the table information. +573 * +574 * @param connection The Connection to the HBase cluster. MUST be unmanaged. We will close. +575 * @param tableName The {@link TableName} of the table to process. +576 * @throws IOException +577 */ +578 protected void initializeTable(Connection connection, TableName tableName) throws IOException { +579 if (this.table != null || this.connection != null) { +580 LOG.warn("initializeTable called multiple times. Overwriting connection and table " + +581 "reference; TableInputFormatBase will not close these old references when done."); +582 } +583 this.table = connection.getTable(tableName); +584 this.regionLocator = connection.getRegionLocator(tableName); +585 this.admin = connection.getAdmin(); +586 this.connection = connection; +587 } +588 +589 /** +590 * Gets the scan defining the actual details like columns etc. +591 * +592 * @return The internal scan instance. +593 */ +594 public Scan getScan() { +595 if (this.scan == null) this.scan = new Scan(); +596 return scan; +597 } +598 +599 /** +600 * Sets the scan defining the actual details like columns etc. +601 * +602 * @param scan The scan to set. +603 */ +604 public void setScan(Scan scan) { +605 this.scan = scan; +606 } +607 +608 /** +609 * Allows subclasses to set the {@link TableRecordReader}. +610 * +611 * @param tableRecordReader A different {@link TableRecordReader} +612 * implementation. +613 */ +614 protected void setTableRecordReader(TableRecordReader tableRecordReader) { +615 this.tableRecordReader = tableRecordReader; +616 } +617 +618 /** +619 * Handle subclass specific set up. +620 * Each of the entry points used by the MapReduce framework, +621 * {@link #createRecordReader(InputSplit, TaskAttemptContext)} and {@link #getSplits(JobContext)}, +622 * will call {@link #initialize(JobContext)} as a convenient centralized location to handle +623 * retrieving the necessary configuration information and calling +624 * {@link #initializeTable(Connection, TableName)}. +625 * +626 * Subclasses should implement their initialize call such that it is safe to call multiple times. +627 * The current TableInputFormatBase implementation relies on a non-null table reference to decide +628 * if an initialize call is needed, but this behavior may change in the future. In particular, +629 * it is critical that initializeTable not be called multiple times since this will leak +630 * Connection instances. +631 * +632 */ +633 protected void initialize(JobContext context) throws IOException { +634 } +635 +636 /** +637 * Close the Table and related objects that were initialized via +638 * {@link #initializeTable(Connection, TableName)}. +639 * +640 * @throws IOException +641 */ +642 protected void closeTable() throws IOException { +643 close(admin, table, regionLocator, connection); +644 admin = null; +645 table = null; +646 regionLocator = null; +647 connection = null; +648 } +649 +650 private void close(Closeable... closables) throws IOException { +651 for (Closeable c : closables) { +652 if(c != null) { c.close(); } +653 } +654 } +655 +656} http://git-wip-us.apache.org/repos/asf/hbase-site/blob/2b3f2bee/book.html ---------------------------------------------------------------------- diff --git a/book.html b/book.html index ff26207..70fd530 100644 --- a/book.html +++ b/book.html @@ -3638,7 +3638,7 @@ Some configurations would only appear in source code; the only way to identify t
Description
-

The HFile format version to use for new files. Version 3 adds support for tags in hfiles (See http://hbase.apache.org/book.html#hbase.tags). Distributed Log Replay requires that tags are enabled. Also see the configuration 'hbase.replication.rpc.codec'.

+

The HFile format version to use for new files. Version 3 adds support for tags in hfiles (See http://hbase.apache.org/book.html#hbase.tags). Also see the configuration 'hbase.replication.rpc.codec'.

Default
@@ -4883,7 +4883,7 @@ Some configurations would only appear in source code; the only way to identify t
Description
-

By default, in replication we can not make sure the order of operations in slave cluster is same as the order in master. If set REPLICATION_SCOPE to 2, we will push edits by the order of written. This configure is to set how long (in ms) we will wait before next checking if a log can not push right now because there are some logs written before it have not been pushed. A larger waiting will decrease the number of queries on hbase:meta but will enlarge the delay of replication. This feature relies on zk-less assignment, and conflicts with distributed log replay. So users must set hbase.assignment.usezk and hbase.master.distributed.log.replay to false to support it.

+

By default, in replication we can not make sure the order of operations in slave cluster is same as the order in master. If set REPLICATION_SCOPE to 2, we will push edits by the order of written. This configure is to set how long (in ms) we will wait before next checking if a log can not push right now because there are some logs written before it have not been pushed. A larger waiting will decrease the number of queries on hbase:meta but will enlarge the delay of replication. This feature relies on zk-less assignment, so users must set hbase.assignment.usezk to false to support it.

Default
@@ -6421,10 +6421,6 @@ ports.

If you have your own customer filters.

See the release notes on the issue HBASE-12068 [Branch-1] Avoid need to always do KeyValueUtil#ensureKeyValue for Filter transformCell; be sure to follow the recommendations therein.

-
-
Distributed Log Replay
-

Distributed Log Replay is off by default in HBase 1.0.0. Enabling it can make a big difference improving HBase MTTR. Enable this feature if you are doing a clean stop/start when you are upgrading. You cannot rolling upgrade to this feature (caveat if you are running on a version of HBase in excess of HBase 0.98.4 — see HBASE-12577 Disable distributed log replay by default for more).

-
Mismatch Of hbase.client.scanner.max.result.size Between Client and Server

If either the client or server version is lower than 0.98.11/1.0.0 and the server @@ -14834,41 +14830,6 @@ If none are found, it throws an exception so that the log splitting can be retri

-
-
Distributed Log Replay
-
-

After a RegionServer fails, its failed regions are assigned to another RegionServer, which are marked as "recovering" in ZooKeeper. -A split log worker directly replays edits from the WAL of the failed RegionServer to the regions at its new location. -When a region is in "recovering" state, it can accept writes but no reads (including Append and Increment), region splits or merges.

-
-
-

Distributed Log Replay extends the Enabling or Disabling Distributed Log Splitting framework. -It works by directly replaying WAL edits to another RegionServer instead of creating recovered.edits files. -It provides the following advantages over distributed log splitting alone:

-
-
-
    -
  • -

    It eliminates the overhead of writing and reading a large number of recovered.edits files. -It is not unusual for thousands of recovered.edits files to be created and written concurrently during a RegionServer recovery. -Many small random writes can degrade overall system performance.

    -
  • -
  • -

    It allows writes even when a region is in recovering state. -It only takes seconds for a recovering region to accept writes again.

    -
  • -
-
-
-
Enabling Distributed Log Replay
-

To enable distributed log replay, set hbase.master.distributed.log.replay to true. -This will be the default for HBase 0.99 (HBASE-10888).

-
-
-

You must also enable HFile version 3 (which is the default HFile format starting in HBase 0.99. -See HBASE-10855). Distributed log replay is unsafe for rolling upgrades.

-
-
@@ -35484,7 +35445,7 @@ The server will return cellblocks compressed using this same compressor as long