Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A662D11A4F for ; Tue, 22 Jul 2014 16:20:39 +0000 (UTC) Received: (qmail 35370 invoked by uid 500); 22 Jul 2014 16:20:39 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 35326 invoked by uid 500); 22 Jul 2014 16:20:39 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 35313 invoked by uid 99); 22 Jul 2014 16:20:39 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Jul 2014 16:20:39 +0000 Date: Tue, 22 Jul 2014 16:20:39 +0000 (UTC) From: "Andrew Purtell (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-11558) Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+ MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-11558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070449#comment-14070449 ] Andrew Purtell commented on HBASE-11558: ---------------------------------------- bq. Add caching to Scan object, adding an extra int to the payload for the Scan object which is really not needed in the general case. But with protobuf there's no overhead if not set. Are you working on a patch [~ishanc], or would you like one of us to take it? > Caching set on Scan object gets lost when using TableMapReduceUtil in 0.95+ > --------------------------------------------------------------------------- > > Key: HBASE-11558 > URL: https://issues.apache.org/jira/browse/HBASE-11558 > Project: HBase > Issue Type: Bug > Components: mapreduce, Scanners > Affects Versions: 0.98.0, 0.95.0, 0.96.0 > Reporter: Ishan Chhabra > > 0.94 and before, if one sets caching on the Scan object in the Job by calling scan.setCaching(int) and passes it to TableMapReduceUtil, it is correctly read and used by the mappers during a mapreduce job. This is because Scan.write respects and serializes caching, which is used internally by TableMapReduceUtil to serialize and transfer the scan object to the mappers. > 0.95+, after the move to protobuf, ProtobufUtil.toScan does not respect caching anymore as ClientProtos.Scan does not have the field caching. Caching is passed via the ScanRequest object to the server and so is not needed in the Scan object. However, this breaks application code that relies on the earlier behavior. This will lead to sudden degradation in Scan performance 0.96+ for users relying on the old behavior. > There are 2 options here: > 1. Add caching to Scan object, adding an extra int to the payload for the Scan object which is really not needed in the general case. > 2. Document and preach that TableMapReduceUtil.setScannerCaching must be called by the client. -- This message was sent by Atlassian JIRA (v6.2#6252)