Source code

Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 68768200D37 for ; Thu, 9 Nov 2017 16:16:43 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 6710F160BEF; Thu, 9 Nov 2017 15:16:43 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3AADC1609E5 for ; Thu, 9 Nov 2017 16:16:41 +0100 (CET) Received: (qmail 49442 invoked by uid 500); 9 Nov 2017 15:16:33 -0000 Mailing-List: contact commits-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list commits@hbase.apache.org Received: (qmail 48117 invoked by uid 99); 9 Nov 2017 15:16:32 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Nov 2017 15:16:32 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 0009AF5E03; Thu, 9 Nov 2017 15:16:30 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: git-site-role@apache.org To: commits@hbase.apache.org Date: Thu, 09 Nov 2017 15:17:10 -0000 Message-Id: <39887ef9b7b84e4f935e2800afb3dbec@git.apache.org> In-Reply-To: <293710d66d294e25b3c51011fbb6c268@git.apache.org> References: <293710d66d294e25b3c51011fbb6c268@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [42/51] [partial] hbase-site git commit: Published site at . archived-at: Thu, 09 Nov 2017 15:16:43 -0000 http://git-wip-us.apache.org/repos/asf/hbase-site/blob/2b3f2bee/apidocs/src-html/org/apache/hadoop/hbase/exceptions/RegionInRecoveryException.html ---------------------------------------------------------------------- diff --git a/apidocs/src-html/org/apache/hadoop/hbase/exceptions/RegionInRecoveryException.html b/apidocs/src-html/org/apache/hadoop/hbase/exceptions/RegionInRecoveryException.html deleted file mode 100644 index e324a9e..0000000 --- a/apidocs/src-html/org/apache/hadoop/hbase/exceptions/RegionInRecoveryException.html +++ /dev/null @@ -1,116 +0,0 @@ - - - -Source code - - - -

001/**
-002 *
-003 * Licensed to the Apache Software Foundation (ASF) under one
-004 * or more contributor license agreements.  See the NOTICE file
-005 * distributed with this work for additional information
-006 * regarding copyright ownership.  The ASF licenses this file
-007 * to you under the Apache License, Version 2.0 (the
-008 * "License"); you may not use this file except in compliance
-009 * with the License.  You may obtain a copy of the License at
-010 *
-011 *     http://www.apache.org/licenses/LICENSE-2.0
-012 *
-013 * Unless required by applicable law or agreed to in writing, software
-014 * distributed under the License is distributed on an "AS IS" BASIS,
-015 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-016 * See the License for the specific language governing permissions and
-017 * limitations under the License.
-018 */
-019package org.apache.hadoop.hbase.exceptions;
-020
-021import org.apache.hadoop.hbase.NotServingRegionException;
-022import org.apache.yetus.audience.InterfaceAudience;
-023
-024/**
-025 * Thrown when a read request issued against a region which is in recovering state.
-026 */
-027@InterfaceAudience.Public
-028public class RegionInRecoveryException extends NotServingRegionException {
-029  private static final long serialVersionUID = 327302071153799L;
-030
-031  /** default constructor */
-032  public RegionInRecoveryException() {
-033    super();
-034  }
-035
-036  /**
-037   * Constructor
-038   * @param s message
-039   */
-040  public RegionInRecoveryException(String s) {
-041    super(s);
-042  }
-043
-044}
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-

- - http://git-wip-us.apache.org/repos/asf/hbase-site/blob/2b3f2bee/apidocs/src-html/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.html ---------------------------------------------------------------------- diff --git a/apidocs/src-html/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.html b/apidocs/src-html/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.html index 898d59a..be26fad 100644 --- a/apidocs/src-html/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.html +++ b/apidocs/src-html/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.html @@ -276,391 +276,392 @@ 268 } 269 270 //The default value of "hbase.mapreduce.input.autobalance" is false. -271 if (context.getConfiguration().getBoolean(MAPREDUCE_INPUT_AUTOBALANCE, false) != false) { -272 long maxAveRegionSize = context.getConfiguration().getInt(MAX_AVERAGE_REGION_SIZE, 8*1073741824); -273 return calculateAutoBalancedSplits(splits, maxAveRegionSize); -274 } -275 -276 // return one mapper per region -277 return splits; -278 } finally { -279 if (closeOnFinish) { -280 closeTable(); -281 } -282 } -283 } -284 -285 /** -286 * Create one InputSplit per region -287 * -288 * @return The list of InputSplit for all the regions -289 * @throws IOException -290 */ -291 private List<InputSplit> oneInputSplitPerRegion() throws IOException { -292 RegionSizeCalculator sizeCalculator = -293 new RegionSizeCalculator(getRegionLocator(), getAdmin()); -294 -295 TableName tableName = getTable().getName(); -296 -297 Pair<byte[][], byte[][]> keys = getStartEndKeys(); -298 if (keys == null || keys.getFirst() == null || -299 keys.getFirst().length == 0) { -300 HRegionLocation regLoc = -301 getRegionLocator().getRegionLocation(HConstants.EMPTY_BYTE_ARRAY, false); -302 if (null == regLoc) { -303 throw new IOException("Expecting at least one region."); -304 } -305 List<InputSplit> splits = new ArrayList<>(1); -306 long regionSize = sizeCalculator.getRegionSize(regLoc.getRegionInfo().getRegionName()); -307 TableSplit split = new TableSplit(tableName, scan, -308 HConstants.EMPTY_BYTE_ARRAY, HConstants.EMPTY_BYTE_ARRAY, regLoc -309 .getHostnamePort().split(Addressing.HOSTNAME_PORT_SEPARATOR)[0], regionSize); -310 splits.add(split); -311 return splits; -312 } -313 List<InputSplit> splits = new ArrayList<>(keys.getFirst().length); -314 for (int i = 0; i < keys.getFirst().length; i++) { -315 if (!includeRegionInSplit(keys.getFirst()[i], keys.getSecond()[i])) { -316 continue; -317 } -318 -319 byte[] startRow = scan.getStartRow(); -320 byte[] stopRow = scan.getStopRow(); -321 // determine if the given start an stop key fall into the region -322 if ((startRow.length == 0 || keys.getSecond()[i].length == 0 || -323 Bytes.compareTo(startRow, keys.getSecond()[i]) < 0) && -324 (stopRow.length == 0 || -325 Bytes.compareTo(stopRow, keys.getFirst()[i]) > 0)) { -326 byte[] splitStart = startRow.length == 0 || -327 Bytes.compareTo(keys.getFirst()[i], startRow) >= 0 ? -328 keys.getFirst()[i] : startRow; -329 byte[] splitStop = (stopRow.length == 0 || -330 Bytes.compareTo(keys.getSecond()[i], stopRow) <= 0) && -331 keys.getSecond()[i].length > 0 ? -332 keys.getSecond()[i] : stopRow; -333 -334 HRegionLocation location = getRegionLocator().getRegionLocation(keys.getFirst()[i], false); -335 // The below InetSocketAddress creation does a name resolution. -336 InetSocketAddress isa = new InetSocketAddress(location.getHostname(), location.getPort()); -337 if (isa.isUnresolved()) { -338 LOG.warn("Failed resolve " + isa); -339 } -340 InetAddress regionAddress = isa.getAddress(); -341 String regionLocation; -342 regionLocation = reverseDNS(regionAddress); -343 -344 byte[] regionName = location.getRegionInfo().getRegionName(); -345 String encodedRegionName = location.getRegionInfo().getEncodedName(); -346 long regionSize = sizeCalculator.getRegionSize(regionName); -347 TableSplit split = new TableSplit(tableName, scan, -348 splitStart, splitStop, regionLocation, encodedRegionName, regionSize); -349 splits.add(split); -350 if (LOG.isDebugEnabled()) { -351 LOG.debug("getSplits: split -> " + i + " -> " + split); -352 } -353 } -354 } -355 return splits; -356 } -357 -358 /** -359 * Create n splits for one InputSplit, For now only support uniform distribution -360 * @param split A TableSplit corresponding to a range of rowkeys -361 * @param n Number of ranges after splitting. Pass 1 means no split for the range -362 * Pass 2 if you want to split the range in two; -363 * @return A list of TableSplit, the size of the list is n -364 * @throws IllegalArgumentIOException -365 */ -366 protected List<InputSplit> createNInputSplitsUniform(InputSplit split, int n) -367 throws IllegalArgumentIOException { -368 if (split == null || !(split instanceof TableSplit)) { -369 throw new IllegalArgumentIOException( -370 "InputSplit for CreateNSplitsPerRegion can not be null + " -371 + "and should be instance of TableSplit"); -372 } -373 //if n < 1, then still continue using n = 1 -374 n = n < 1 ? 1 : n; -375 List<InputSplit> res = new ArrayList<>(n); -376 if (n == 1) { -377 res.add(split); -378 return res; -379 } -380 -381 // Collect Region related information -382 TableSplit ts = (TableSplit) split; -383 TableName tableName = ts.getTable(); -384 String regionLocation = ts.getRegionLocation(); -385 String encodedRegionName = ts.getEncodedRegionName(); -386 long regionSize = ts.getLength(); -387 byte[] startRow = ts.getStartRow(); -388 byte[] endRow = ts.getEndRow(); -389 -390 // For special case: startRow or endRow is empty -391 if (startRow.length == 0 && endRow.length == 0){ -392 startRow = new byte[1]; -393 endRow = new byte[1]; -394 startRow[0] = 0; -395 endRow[0] = -1; -396 } -397 if (startRow.length == 0 && endRow.length != 0){ -398 startRow = new byte[1]; -399 startRow[0] = 0; -400 } -401 if (startRow.length != 0 && endRow.length == 0){ -402 endRow =new byte[startRow.length]; -403 for (int k = 0; k < startRow.length; k++){ -404 endRow[k] = -1; -405 } -406 } -407 -408 // Split Region into n chunks evenly -409 byte[][] splitKeys = Bytes.split(startRow, endRow, true, n-1); -410 for (int i = 0; i < splitKeys.length - 1; i++) { -411 //notice that the regionSize parameter may be not very accurate -412 TableSplit tsplit = -413 new TableSplit(tableName, scan, splitKeys[i], splitKeys[i + 1], regionLocation, -414 encodedRegionName, regionSize / n); -415 res.add(tsplit); -416 } -417 return res; -418 } -419 /** -420 * Calculates the number of MapReduce input splits for the map tasks. The number of -421 * MapReduce input splits depends on the average region size. -422 * Make it 'public' for testing -423 * -424 * @param splits The list of input splits before balance. -425 * @param maxAverageRegionSize max Average region size for one mapper -426 * @return The list of input splits. -427 * @throws IOException When creating the list of splits fails. -428 * @see org.apache.hadoop.mapreduce.InputFormat#getSplits( -429 *org.apache.hadoop.mapreduce.JobContext) -430 */ -431 public List<InputSplit> calculateAutoBalancedSplits(List<InputSplit> splits, long maxAverageRegionSize) -432 throws IOException { -433 if (splits.size() == 0) { -434 return splits; -435 } -436 List<InputSplit> resultList = new ArrayList<>(); -437 long totalRegionSize = 0; -438 for (int i = 0; i < splits.size(); i++) { -439 TableSplit ts = (TableSplit) splits.get(i); -440 totalRegionSize += ts.getLength(); -441 } -442 long averageRegionSize = totalRegionSize / splits.size(); -443 // totalRegionSize might be overflow, and the averageRegionSize must be positive. -444 if (averageRegionSize <= 0) { -445 LOG.warn("The averageRegionSize is not positive: " + averageRegionSize + ", " + -446 "set it to Long.MAX_VALUE " + splits.size()); -447 averageRegionSize = Long.MAX_VALUE / splits.size(); -448 } -449 //if averageRegionSize is too big, change it to default as 1 GB, -450 if (averageRegionSize > maxAverageRegionSize) { -451 averageRegionSize = maxAverageRegionSize; -452 } -453 // if averageRegionSize is too small, we do not need to allocate more mappers for those 'large' region -454 // set default as 16M = (default hdfs block size) / 4; -455 if (averageRegionSize < 16 * 1048576) { -456 return splits; -457 } -458 for (int i = 0; i < splits.size(); i++) { -459 TableSplit ts = (TableSplit) splits.get(i); -460 TableName tableName = ts.getTable(); -461 String regionLocation = ts.getRegionLocation(); -462 String encodedRegionName = ts.getEncodedRegionName(); -463 long regionSize = ts.getLength(); -464 -465 if (regionSize >= averageRegionSize) { -466 // make this region as multiple MapReduce input split. -467 int n = (int) Math.round(Math.log(((double) regionSize) / ((double) averageRegionSize)) + 1.0); -468 List<InputSplit> temp = createNInputSplitsUniform(ts, n); -469 resultList.addAll(temp); -470 } else { -471 // if the total size of several small continuous regions less than the average region size, -472 // combine them into one MapReduce input split. -473 long totalSize = regionSize; -474 byte[] splitStartKey = ts.getStartRow(); -475 byte[] splitEndKey = ts.getEndRow(); -476 int j = i + 1; -477 while (j < splits.size()) { -478 TableSplit nextRegion = (TableSplit) splits.get(j); -479 long nextRegionSize = nextRegion.getLength(); -480 if (totalSize + nextRegionSize <= averageRegionSize) { -481 totalSize = totalSize + nextRegionSize; -482 splitEndKey = nextRegion.getEndRow(); -483 j++; -484 } else { -485 break; -486 } -487 } -488 i = j - 1; -489 TableSplit t = new TableSplit(tableName, scan, splitStartKey, splitEndKey, regionLocation, -490 encodedRegionName, totalSize); -491 resultList.add(t); -492 } -493 } -494 return resultList; -495 } -496 -497 String reverseDNS(InetAddress ipAddress) throws UnknownHostException { -498 String hostName = this.reverseDNSCacheMap.get(ipAddress); -499 if (hostName == null) { -500 String ipAddressString = null; -501 try { -502 ipAddressString = DNS.reverseDns(ipAddress, null); -503 } catch (Exception e) { -504 // We can use InetAddress in case the jndi failed to pull up the reverse DNS entry from the -505 // name service. Also, in case of ipv6, we need to use the InetAddress since resolving -506 // reverse DNS using jndi doesn't work well with ipv6 addresses. -507 ipAddressString = InetAddress.getByName(ipAddress.getHostAddress()).getHostName(); -508 } -509 if (ipAddressString == null) throw new UnknownHostException("No host found for " + ipAddress); -510 hostName = Strings.domainNamePointerToHostName(ipAddressString); -511 this.reverseDNSCacheMap.put(ipAddress, hostName); -512 } -513 return hostName; -514 } -515 -516 /** -517 * Test if the given region is to be included in the InputSplit while splitting -518 * the regions of a table. -519 * -520 * This optimization is effective when there is a specific reasoning to exclude an entire region from the M-R job, -521 * (and hence, not contributing to the InputSplit), given the start and end keys of the same. -522 * Useful when we need to remember the last-processed top record and revisit the [last, current) interval for M-R processing, -523 * continuously. In addition to reducing InputSplits, reduces the load on the region server as well, due to the ordering of the keys. -524 * +271 if (context.getConfiguration().getBoolean(MAPREDUCE_INPUT_AUTOBALANCE, false)) { +272 long maxAveRegionSize = context.getConfiguration() +273 .getLong(MAX_AVERAGE_REGION_SIZE, 8L*1073741824); //8GB +274 return calculateAutoBalancedSplits(splits, maxAveRegionSize); +275 } +276 +277 // return one mapper per region +278 return splits; +279 } finally { +280 if (closeOnFinish) { +281 closeTable(); +282 } +283 } +284 } +285 +286 /** +287 * Create one InputSplit per region +288 * +289 * @return The list of InputSplit for all the regions +290 * @throws IOException +291 */ +292 private List<InputSplit> oneInputSplitPerRegion() throws IOException { +293 RegionSizeCalculator sizeCalculator = +294 new RegionSizeCalculator(getRegionLocator(), getAdmin()); +295 +296 TableName tableName = getTable().getName(); +297 +298 Pair<byte[][], byte[][]> keys = getStartEndKeys(); +299 if (keys == null || keys.getFirst() == null || +300 keys.getFirst().length == 0) { +301 HRegionLocation regLoc = +302 getRegionLocator().getRegionLocation(HConstants.EMPTY_BYTE_ARRAY, false); +303 if (null == regLoc) { +304 throw new IOException("Expecting at least one region."); +305 } +306 List<InputSplit> splits = new ArrayList<>(1); +307 long regionSize = sizeCalculator.getRegionSize(regLoc.getRegionInfo().getRegionName()); +308 TableSplit split = new TableSplit(tableName, scan, +309 HConstants.EMPTY_BYTE_ARRAY, HConstants.EMPTY_BYTE_ARRAY, regLoc +310 .getHostnamePort().split(Addressing.HOSTNAME_PORT_SEPARATOR)[0], regionSize); +311 splits.add(split); +312 return splits; +313 } +314 List<InputSplit> splits = new ArrayList<>(keys.getFirst().length); +315 for (int i = 0; i < keys.getFirst().length; i++) { +316 if (!includeRegionInSplit(keys.getFirst()[i], keys.getSecond()[i])) { +317 continue; +318 } +319 +320 byte[] startRow = scan.getStartRow(); +321 byte[] stopRow = scan.getStopRow(); +322 // determine if the given start an stop key fall into the region +323 if ((startRow.length == 0 || keys.getSecond()[i].length == 0 || +324 Bytes.compareTo(startRow, keys.getSecond()[i]) < 0) && +325 (stopRow.length == 0 || +326 Bytes.compareTo(stopRow, keys.getFirst()[i]) > 0)) { +327 byte[] splitStart = startRow.length == 0 || +328 Bytes.compareTo(keys.getFirst()[i], startRow) >= 0 ? +329 keys.getFirst()[i] : startRow; +330 byte[] splitStop = (stopRow.length == 0 || +331 Bytes.compareTo(keys.getSecond()[i], stopRow) <= 0) && +332 keys.getSecond()[i].length > 0 ? +333 keys.getSecond()[i] : stopRow; +334 +335 HRegionLocation location = getRegionLocator().getRegionLocation(keys.getFirst()[i], false); +336 // The below InetSocketAddress creation does a name resolution. +337 InetSocketAddress isa = new InetSocketAddress(location.getHostname(), location.getPort()); +338 if (isa.isUnresolved()) { +339 LOG.warn("Failed resolve " + isa); +340 } +341 InetAddress regionAddress = isa.getAddress(); +342 String regionLocation; +343 regionLocation = reverseDNS(regionAddress); +344 +345 byte[] regionName = location.getRegionInfo().getRegionName(); +346 String encodedRegionName = location.getRegionInfo().getEncodedName(); +347 long regionSize = sizeCalculator.getRegionSize(regionName); +348 TableSplit split = new TableSplit(tableName, scan, +349 splitStart, splitStop, regionLocation, encodedRegionName, regionSize); +350 splits.add(split); +351 if (LOG.isDebugEnabled()) { +352 LOG.debug("getSplits: split -> " + i + " -> " + split); +353 } +354 } +355 } +356 return splits; +357 } +358 +359 /** +360 * Create n splits for one InputSplit, For now only support uniform distribution +361 * @param split A TableSplit corresponding to a range of rowkeys +362 * @param n Number of ranges after splitting. Pass 1 means no split for the range +363 * Pass 2 if you want to split the range in two; +364 * @return A list of TableSplit, the size of the list is n +365 * @throws IllegalArgumentIOException +366 */ +367 protected List<InputSplit> createNInputSplitsUniform(InputSplit split, int n) +368 throws IllegalArgumentIOException { +369 if (split == null || !(split instanceof TableSplit)) { +370 throw new IllegalArgumentIOException( +371 "InputSplit for CreateNSplitsPerRegion can not be null + " +372 + "and should be instance of TableSplit"); +373 } +374 //if n < 1, then still continue using n = 1 +375 n = n < 1 ? 1 : n; +376 List<InputSplit> res = new ArrayList<>(n); +377 if (n == 1) { +378 res.add(split); +379 return res; +380 } +381 +382 // Collect Region related information +383 TableSplit ts = (TableSplit) split; +384 TableName tableName = ts.getTable(); +385 String regionLocation = ts.getRegionLocation(); +386 String encodedRegionName = ts.getEncodedRegionName(); +387 long regionSize = ts.getLength(); +388 byte[] startRow = ts.getStartRow(); +389 byte[] endRow = ts.getEndRow(); +390 +391 // For special case: startRow or endRow is empty +392 if (startRow.length == 0 && endRow.length == 0){ +393 startRow = new byte[1]; +394 endRow = new byte[1]; +395 startRow[0] = 0; +396 endRow[0] = -1; +397 } +398 if (startRow.length == 0 && endRow.length != 0){ +399 startRow = new byte[1]; +400 startRow[0] = 0; +401 } +402 if (startRow.length != 0 && endRow.length == 0){ +403 endRow =new byte[startRow.length]; +404 for (int k = 0; k < startRow.length; k++){ +405 endRow[k] = -1; +406 } +407 } +408 +409 // Split Region into n chunks evenly +410 byte[][] splitKeys = Bytes.split(startRow, endRow, true, n-1); +411 for (int i = 0; i < splitKeys.length - 1; i++) { +412 //notice that the regionSize parameter may be not very accurate +413 TableSplit tsplit = +414 new TableSplit(tableName, scan, splitKeys[i], splitKeys[i + 1], regionLocation, +415 encodedRegionName, regionSize / n); +416 res.add(tsplit); +417 } +418 return res; +419 } +420 /** +421 * Calculates the number of MapReduce input splits for the map tasks. The number of +422 * MapReduce input splits depends on the average region size. +423 * Make it 'public' for testing +424 * +425 * @param splits The list of input splits before balance. +426 * @param maxAverageRegionSize max Average region size for one mapper +427 * @return The list of input splits. +428 * @throws IOException When creating the list of splits fails. +429 * @see org.apache.hadoop.mapreduce.InputFormat#getSplits( +430 *org.apache.hadoop.mapreduce.JobContext) +431 */ +432 public List<InputSplit> calculateAutoBalancedSplits(List<InputSplit> splits, long maxAverageRegionSize) +433 throws IOException { +434 if (splits.size() == 0) { +435 return splits; +436 } +437 List<InputSplit> resultList = new ArrayList<>(); +438 long totalRegionSize = 0; +439 for (int i = 0; i < splits.size(); i++) { +440 TableSplit ts = (TableSplit) splits.get(i); +441 totalRegionSize += ts.getLength(); +442 } +443 long averageRegionSize = totalRegionSize / splits.size(); +444 // totalRegionSize might be overflow, and the averageRegionSize must be positive. +445 if (averageRegionSize <= 0) { +446 LOG.warn("The averageRegionSize is not positive: " + averageRegionSize + ", " + +447 "set it to Long.MAX_VALUE " + splits.size()); +448 averageRegionSize = Long.MAX_VALUE / splits.size(); +449 } +450 //if averageRegionSize is too big, change it to default as 1 GB, +451 if (averageRegionSize > maxAverageRegionSize) { +452 averageRegionSize = maxAverageRegionSize; +453 } +454 // if averageRegionSize is too small, we do not need to allocate more mappers for those 'large' region +455 // set default as 16M = (default hdfs block size) / 4; +456 if (averageRegionSize < 16 * 1048576) { +457 return splits; +458 } +459 for (int i = 0; i < splits.size(); i++) { +460 TableSplit ts = (TableSplit) splits.get(i); +461 TableName tableName = ts.getTable(); +462 String regionLocation = ts.getRegionLocation(); +463 String encodedRegionName = ts.getEncodedRegionName(); +464 long regionSize = ts.getLength(); +465 +466 if (regionSize >= averageRegionSize) { +467 // make this region as multiple MapReduce input split. +468 int n = (int) Math.round(Math.log(((double) regionSize) / ((double) averageRegionSize)) + 1.0); +469 List<InputSplit> temp = createNInputSplitsUniform(ts, n); +470 resultList.addAll(temp); +471 } else { +472 // if the total size of several small continuous regions less than the average region size, +473 // combine them into one MapReduce input split. +474 long totalSize = regionSize; +475 byte[] splitStartKey = ts.getStartRow(); +476 byte[] splitEndKey = ts.getEndRow(); +477 int j = i + 1; +478 while (j < splits.size()) { +479 TableSplit nextRegion = (TableSplit) splits.get(j); +480 long nextRegionSize = nextRegion.getLength(); +481 if (totalSize + nextRegionSize <= averageRegionSize) { +482 totalSize = totalSize + nextRegionSize; +483 splitEndKey = nextRegion.getEndRow(); +484 j++; +485 } else { +486 break; +487 } +488 } +489 i = j - 1; +490 TableSplit t = new TableSplit(tableName, scan, splitStartKey, splitEndKey, regionLocation, +491 encodedRegionName, totalSize); +492 resultList.add(t); +493 } +494 } +495 return resultList; +496 } +497 +498 String reverseDNS(InetAddress ipAddress) throws UnknownHostException { +499 String hostName = this.reverseDNSCacheMap.get(ipAddress); +500 if (hostName == null) { +501 String ipAddressString = null; +502 try { +503 ipAddressString = DNS.reverseDns(ipAddress, null); +504 } catch (Exception e) { +505 // We can use InetAddress in case the jndi failed to pull up the reverse DNS entry from the +506 // name service. Also, in case of ipv6, we need to use the InetAddress since resolving +507 // reverse DNS using jndi doesn't work well with ipv6 addresses. +508 ipAddressString = InetAddress.getByName(ipAddress.getHostAddress()).getHostName(); +509 } +510 if (ipAddressString == null) throw new UnknownHostException("No host found for " + ipAddress); +511 hostName = Strings.domainNamePointerToHostName(ipAddressString); +512 this.reverseDNSCacheMap.put(ipAddress, hostName); +513 } +514 return hostName; +515 } +516 +517 /** +518 * Test if the given region is to be included in the InputSplit while splitting +519 * the regions of a table. +520 * +521 * This optimization is effective when there is a specific reasoning to exclude an entire region from the M-R job, +522 * (and hence, not contributing to the InputSplit), given the start and end keys of the same. +523 * Useful when we need to remember the last-processed top record and revisit the [last, current) interval for M-R processing, +524 * continuously. In addition to reducing InputSplits, reduces the load on the region server as well, due to the ordering of the keys. 525 * -526 * Note: It is possible that <code>endKey.length() == 0 </code> , for the last (recent) region. -527 * -528 * Override this method, if you want to bulk exclude regions altogether from M-R. By default, no region is excluded( i.e. all regions are included). -529 * +526 * +527 * Note: It is possible that <code>endKey.length() == 0 </code> , for the last (recent) region. +528 * +529 * Override this method, if you want to bulk exclude regions altogether from M-R. By default, no region is excluded( i.e. all regions are included). 530 * -531 * @param startKey Start key of the region -532 * @param endKey End key of the region -533 * @return true, if this region needs to be included as part of the input (default). -534 * -535 */ -536 protected boolean includeRegionInSplit(final byte[] startKey, final byte [] endKey) { -537 return true; -538 } -539 -540 /** -541 * Allows subclasses to get the {@link RegionLocator}. -542 */ -543 protected RegionLocator getRegionLocator() { -544 if (regionLocator == null) { -545 throw new IllegalStateException(NOT_INITIALIZED); -546 } -547 return regionLocator; -548 } -549 -550 /** -551 * Allows subclasses to get the {@link Table}. -552 */ -553 protected Table getTable() { -554 if (table == null) { -555 throw new IllegalStateException(NOT_INITIALIZED); -556 } -557 return table; -558 } -559 -560 /** -561 * Allows subclasses to get the {@link Admin}. -562 */ -563 protected Admin getAdmin() { -564 if (admin == null) { -565 throw new IllegalStateException(NOT_INITIALIZED); -566 } -567 return admin; -568 } -569 -570 /** -571 * Allows subclasses to initialize the table information. -572 * -573 * @param connection The Connection to the HBase cluster. MUST be unmanaged. We will close. -574 * @param tableName The {@link TableName} of the table to process. -575 * @throws IOException -576 */ -577 protected void initializeTable(Connection connection, TableName tableName) throws IOException { -578 if (this.table != null || this.connection != null) { -579 LOG.warn("initializeTable called multiple times. Overwriting connection and table " + -580 "reference; TableInputFormatBase will not close these old references when done."); -581 } -582 this.table = connection.getTable(tableName); -583 this.regionLocator = connection.getRegionLocator(tableName); -584 this.admin = connection.getAdmin(); -585 this.connection = connection; -586 } -587 -588 /** -589 * Gets the scan defining the actual details like columns etc. -590 * -591 * @return The internal scan instance. -592 */ -593 public Scan getScan() { -594 if (this.scan == null) this.scan = new Scan(); -595 return scan; -596 } -597 -598 /** -599 * Sets the scan defining the actual details like columns etc. -600 * -601 * @param scan The scan to set. -602 */ -603 public void setScan(Scan scan) { -604 this.scan = scan; -605 } -606 -607 /** -608 * Allows subclasses to set the {@link TableRecordReader}. -609 * -610 * @param tableRecordReader A different {@link TableRecordReader} -611 * implementation. -612 */ -613 protected void setTableRecordReader(TableRecordReader tableRecordReader) { -614 this.tableRecordReader = tableRecordReader; -615 } -616 -617 /** -618 * Handle subclass specific set up. -619 * Each of the entry points used by the MapReduce framework, -620 * {@link #createRecordReader(InputSplit, TaskAttemptContext)} and {@link #getSplits(JobContext)}, -621 * will call {@link #initialize(JobContext)} as a convenient centralized location to handle -622 * retrieving the necessary configuration information and calling -623 * {@link #initializeTable(Connection, TableName)}. -624 * -625 * Subclasses should implement their initialize call such that it is safe to call multiple times. -626 * The current TableInputFormatBase implementation relies on a non-null table reference to decide -627 * if an initialize call is needed, but this behavior may change in the future. In particular, -628 * it is critical that initializeTable not be called multiple times since this will leak -629 * Connection instances. -630 * -631 */ -632 protected void initialize(JobContext context) throws IOException { -633 } -634 -635 /** -636 * Close the Table and related objects that were initialized via -637 * {@link #initializeTable(Connection, TableName)}. -638 * -639 * @throws IOException -640 */ -641 protected void closeTable() throws IOException { -642 close(admin, table, regionLocator, connection); -643 admin = null; -644 table = null; -645 regionLocator = null; -646 connection = null; -647 } -648 -649 private void close(Closeable... closables) throws IOException { -650 for (Closeable c : closables) { -651 if(c != null) { c.close(); } -652 } -653 } -654 -655} +531 * +532 * @param startKey Start key of the region +533 * @param endKey End key of the region +534 * @return true, if this region needs to be included as part of the input (default). +535 * +536 */ +537 protected boolean includeRegionInSplit(final byte[] startKey, final byte [] endKey) { +538 return true; +539 } +540 +541 /** +542 * Allows subclasses to get the {@link RegionLocator}. +543 */ +544 protected RegionLocator getRegionLocator() { +545 if (regionLocator == null) { +546 throw new IllegalStateException(NOT_INITIALIZED); +547 } +548 return regionLocator; +549 } +550 +551 /** +552 * Allows subclasses to get the {@link Table}. +553 */ +554 protected Table getTable() { +555 if (table == null) { +556 throw new IllegalStateException(NOT_INITIALIZED); +557 } +558 return table; +559 } +560 +561 /** +562 * Allows subclasses to get the {@link Admin}. +563 */ +564 protected Admin getAdmin() { +565 if (admin == null) { +566 throw new IllegalStateException(NOT_INITIALIZED); +567 } +568 return admin; +569 } +570 +571 /** +572 * Allows subclasses to initialize the table information. +573 * +574 * @param connection The Connection to the HBase cluster. MUST be unmanaged. We will close. +575 * @param tableName The {@link TableName} of the table to process. +576 * @throws IOException +577 */ +578 protected void initializeTable(Connection connection, TableName tableName) throws IOException { +579 if (this.table != null || this.connection != null) { +580 LOG.warn("initializeTable called multiple times. Overwriting connection and table " + +581 "reference; TableInputFormatBase will not close these old references when done."); +582 } +583 this.table = connection.getTable(tableName); +584 this.regionLocator = connection.getRegionLocator(tableName); +585 this.admin = connection.getAdmin(); +586 this.connection = connection; +587 } +588 +589 /** +590 * Gets the scan defining the actual details like columns etc. +591 * +592 * @return The internal scan instance. +593 */ +594 public Scan getScan() { +595 if (this.scan == null) this.scan = new Scan(); +596 return scan; +597 } +598 +599 /** +600 * Sets the scan defining the actual details like columns etc. +601 * +602 * @param scan The scan to set. +603 */ +604 public void setScan(Scan scan) { +605 this.scan = scan; +606 } +607 +608 /** +609 * Allows subclasses to set the {@link TableRecordReader}. +610 * +611 * @param tableRecordReader A different {@link TableRecordReader} +612 * implementation. +613 */ +614 protected void setTableRecordReader(TableRecordReader tableRecordReader) { +615 this.tableRecordReader = tableRecordReader; +616 } +617 +618 /** +619 * Handle subclass specific set up. +620 * Each of the entry points used by the MapReduce framework, +621 * {@link #createRecordReader(InputSplit, TaskAttemptContext)} and {@link #getSplits(JobContext)}, +622 * will call {@link #initialize(JobContext)} as a convenient centralized location to handle +623 * retrieving the necessary configuration information and calling +624 * {@link #initializeTable(Connection, TableName)}. +625 * +626 * Subclasses should implement their initialize call such that it is safe to call multiple times. +627 * The current TableInputFormatBase implementation relies on a non-null table reference to decide +628 * if an initialize call is needed, but this behavior may change in the future. In particular, +629 * it is critical that initializeTable not be called multiple times since this will leak +630 * Connection instances. +631 * +632 */ +633 protected void initialize(JobContext context) throws IOException { +634 } +635 +636 /** +637 * Close the Table and related objects that were initialized via +638 * {@link #initializeTable(Connection, TableName)}. +639 * +640 * @throws IOException +641 */ +642 protected void closeTable() throws IOException { +643 close(admin, table, regionLocator, connection); +644 admin = null; +645 table = null; +646 regionLocator = null; +647 connection = null; +648 } +649 +650 private void close(Closeable... closables) throws IOException { +651 for (Closeable c : closables) { +652 if(c != null) { c.close(); } +653 } +654 } +655 +656} http://git-wip-us.apache.org/repos/asf/hbase-site/blob/2b3f2bee/book.html ---------------------------------------------------------------------- diff --git a/book.html b/book.html index ff26207..70fd530 100644 --- a/book.html +++ b/book.html @@ -3638,7 +3638,7 @@ Some configurations would only appear in source code; the only way to identify t

Description

The HFile format version to use for new files. Version 3 adds support for tags in hfiles (See http://hbase.apache.org/book.html#hbase.tags). Distributed Log Replay requires that tags are enabled. Also see the configuration 'hbase.replication.rpc.codec'.

The HFile format version to use for new files. Version 3 adds support for tags in hfiles (See http://hbase.apache.org/book.html#hbase.tags). Also see the configuration 'hbase.replication.rpc.codec'.

Default

@@ -4883,7 +4883,7 @@ Some configurations would only appear in source code; the only way to identify t

Description

By default, in replication we can not make sure the order of operations in slave cluster is same as the order in master. If set REPLICATION_SCOPE to 2, we will push edits by the order of written. This configure is to set how long (in ms) we will wait before next checking if a log can not push right now because there are some logs written before it have not been pushed. A larger waiting will decrease the number of queries on hbase:meta but will enlarge the delay of replication. This feature relies on zk-less assignment, and conflicts with distributed log replay. So users must set hbase.assignment.usezk and hbase.master.distributed.log.replay to false to support it.

By default, in replication we can not make sure the order of operations in slave cluster is same as the order in master. If set REPLICATION_SCOPE to 2, we will push edits by the order of written. This configure is to set how long (in ms) we will wait before next checking if a log can not push right now because there are some logs written before it have not been pushed. A larger waiting will decrease the number of queries on hbase:meta but will enlarge the delay of replication. This feature relies on zk-less assignment, so users must set hbase.assignment.usezk to false to support it.

Default

@@ -6421,10 +6421,6 @@ ports.

If you have your own customer filters.

See the release notes on the issue HBASE-12068 [Branch-1] Avoid need to always do KeyValueUtil#ensureKeyValue for Filter transformCell; be sure to follow the recommendations therein.

Distributed Log Replay

Distributed Log Replay is off by default in HBase 1.0.0. Enabling it can make a big difference improving HBase MTTR. Enable this feature if you are doing a clean stop/start when you are upgrading. You cannot rolling upgrade to this feature (caveat if you are running on a version of HBase in excess of HBase 0.98.4 — see HBASE-12577 Disable distributed log replay by default for more).

Mismatch Of hbase.client.scanner.max.result.size Between Client and Server

If either the client or server version is lower than 0.98.11/1.0.0 and the server @@ -14834,41 +14830,6 @@ If none are found, it throws an exception so that the log splitting can be retri

Distributed Log Replay

After a RegionServer fails, its failed regions are assigned to another RegionServer, which are marked as "recovering" in ZooKeeper. -A split log worker directly replays edits from the WAL of the failed RegionServer to the regions at its new location. -When a region is in "recovering" state, it can accept writes but no reads (including Append and Increment), region splits or merges.

Distributed Log Replay extends the Enabling or Disabling Distributed Log Splitting framework. -It works by directly replaying WAL edits to another RegionServer instead of creating recovered.edits files. -It provides the following advantages over distributed log splitting alone:

-
It eliminates the overhead of writing and reading a large number of recovered.edits files. -It is not unusual for thousands of recovered.edits files to be created and written concurrently during a RegionServer recovery. -Many small random writes can degrade overall system performance.
-
-
It allows writes even when a region is in recovering state. -It only takes seconds for a recovering region to accept writes again.
-

Enabling Distributed Log Replay

To enable distributed log replay, set hbase.master.distributed.log.replay to true. -This will be the default for HBase 0.99 (HBASE-10888).

You must also enable HFile version 3 (which is the default HFile format starting in HBase 0.99. -See HBASE-10855). Distributed log replay is unsafe for rolling upgrades.

@@ -35484,7 +35445,7 @@ The server will return cellblocks compressed using this same compressor as long