Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EDC32DAEC for ; Fri, 17 May 2013 17:57:16 +0000 (UTC) Received: (qmail 72078 invoked by uid 500); 17 May 2013 17:57:16 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 72034 invoked by uid 500); 17 May 2013 17:57:16 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 71960 invoked by uid 99); 17 May 2013 17:57:16 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 May 2013 17:57:16 +0000 Date: Fri, 17 May 2013 17:57:16 +0000 (UTC) From: "Doug Meil (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-8571) CopyTable and RowCounter don't seem to use setCaching setting MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-8571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660908#comment-13660908 ] Doug Meil commented on HBASE-8571: ---------------------------------- Make that "every time" I look at it. > CopyTable and RowCounter don't seem to use setCaching setting > ------------------------------------------------------------- > > Key: HBASE-8571 > URL: https://issues.apache.org/jira/browse/HBASE-8571 > Project: HBase > Issue Type: Bug > Reporter: Doug Meil > > Maybe it's just me, but I've been looking on trunk and I don't see where either RowCounter or CopyTable MapReduce can adjust the setCaching setting on the Scan instance. > Example from RowCounter... > {code} > Job job = new Job(conf, NAME + "_" + tableName); > job.setJarByClass(RowCounter.class); > Scan scan = new Scan(); > scan.setCacheBlocks(false); > Set qualifiers = new TreeSet(Bytes.BYTES_COMPARATOR); > if (startKey != null && !startKey.equals("")) { > scan.setStartRow(Bytes.toBytes(startKey)); > } > if (endKey != null && !endKey.equals("")) { > scan.setStopRow(Bytes.toBytes(endKey)); > } > scan.setFilter(new FirstKeyOnlyFilter()); > if (sb.length() > 0) { > for (String columnName : sb.toString().trim().split(" ")) { > String [] fields = columnName.split(":"); > if(fields.length == 1) { > scan.addFamily(Bytes.toBytes(fields[0])); > } else { > byte[] qualifier = Bytes.toBytes(fields[1]); > qualifiers.add(qualifier); > scan.addColumn(Bytes.toBytes(fields[0]), qualifier); > } > } > } > // specified column may or may not be part of first key value for the row. > // Hence do not use FirstKeyOnlyFilter if scan has columns, instead use > // FirstKeyValueMatchingQualifiersFilter. > if (qualifiers.size() == 0) { > scan.setFilter(new FirstKeyOnlyFilter()); > } else { > scan.setFilter(new FirstKeyValueMatchingQualifiersFilter(qualifiers)); > } > job.setOutputFormatClass(NullOutputFormat.class); > TableMapReduceUtil.initTableMapperJob(tableName, scan, > RowCounterMapper.class, ImmutableBytesWritable.class, Result.class, job); > job.setNumReduceTasks(0); > return job; > {code} > TableMapReduceUtil only serializes the Scan into the job, it doesn't adjust any of the settings. > Maybe I'm missing something, but this seems like a problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira