hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "pengyuanbo (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12223) MultiTableInputFormatBase.getSplits is too slow
Date Fri, 10 Oct 2014 02:27:33 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14166178#comment-14166178
] 

pengyuanbo commented on HBASE-12223:
------------------------------------

use AdmasterMultiTableInputFormatBase.class significantly reduce the time of the function
getSplits, when the situation is one table multi scans or multi table one scan or multi table
multi scans. Because in function getSplits of AdmasterMultiTableInputFormatBase,the operation
of table.getStartEndKeys() is very time-consuming;We made the following modifications: the
same table, take only executes the operation of table.getStartEndKeys() once,it will greatly
shorten the time.

@Override
  public List<InputSplit> getSplits(JobContext context) throws IOException {
      if (scans.isEmpty()) {
          throw new IOException("No scans were provided.");
      }

      HashMap<String, List<Scan>> tableMaps = new HashMap<String, List<Scan>>();
      for (Scan scan : scans) {
          String tableNameStr = Bytes.toString(scan.getAttribute(Scan.SCAN_ATTRIBUTES_TABLE_NAME));
          if (tableMaps.containsKey(tableNameStr)) {
              tableMaps.get(tableNameStr).add(scan);
          } else {
              ArrayList<Scan> scanList = new ArrayList<Scan>();
              scanList.add(scan);
              tableMaps.put(tableNameStr, scanList);
          }
      }

      List<InputSplit> splits = new ArrayList<InputSplit>();
      Iterator iter = tableMaps.entrySet().iterator();
      while (iter.hasNext()) {
          Map.Entry<String, List<Scan>> entry = (Map.Entry<String, List<Scan>>)
iter.next();
          String tableNameStr = entry.getKey();
          List<Scan> scanList = entry.getValue();
          HTable table = new HTable(context.getConfiguration(), tableNameStr);
          Pair<byte[][], byte[][]> keys = table.getStartEndKeys();
          for (Scan scan : scanList) {
              if (tableNameStr == null)
                  throw new IOException("A scan object did not have a table name");
              if (keys == null || keys.getFirst() == null ||
                      keys.getFirst().length == 0) {
                  throw new IOException("Expecting at least one region for table : "
                          + tableNameStr);
              }
              int count = 0;
              byte[] startRow = scan.getStartRow();
              byte[] stopRow = scan.getStopRow();
              for (int i = 0; i < keys.getFirst().length; i++) {
                  if (!includeRegionInSplit(keys.getFirst()[i], keys.getSecond()[i])) {
                      continue;
                  }

                  // determine if the given start and stop keys fall into the range
                  if ((startRow.length == 0 || keys.getSecond()[i].length == 0 || Bytes.compareTo(startRow,
                          keys.getSecond()[i]) < 0) && (stopRow.length == 0 ||
Bytes.compareTo(stopRow,
                          keys.getFirst()[i]) > 0)) {
                      byte[] splitStart = startRow.length == 0 || Bytes.compareTo(keys.getFirst()[i],
                              startRow) >= 0 ? keys.getFirst()[i] : startRow;
                      byte[] splitStop = (stopRow.length == 0 || Bytes.compareTo(keys.getSecond()[i],
                              stopRow) <= 0) && keys.getSecond()[i].length >
0 ? keys.getSecond()[i] : stopRow;
                      String regionLocation = table.getRegionLocation(keys.getFirst()[i],
false).getHostname();
                      InputSplit split = new TableSplit(Bytes.toBytes(tableNameStr), scan,
splitStart,
                              splitStop, regionLocation);
                      splits.add(split);
                      if (LOG.isDebugEnabled())
                          LOG.debug("getSplits: split -> " + (count++) + " -> " + split);
                  }
              }
          }
          table.close();
      }

      return splits;
  }

> MultiTableInputFormatBase.getSplits is too slow
> -----------------------------------------------
>
>                 Key: HBASE-12223
>                 URL: https://issues.apache.org/jira/browse/HBASE-12223
>             Project: HBase
>          Issue Type: Improvement
>          Components: Client
>    Affects Versions: 0.94.15
>            Reporter: shanwen
>            Priority: Minor
>
> when use Multiple scan,getSplits is to slow,800 scans take five minutes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message