Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9A9AED28E for ; Wed, 1 Aug 2012 14:29:27 +0000 (UTC) Received: (qmail 1924 invoked by uid 500); 1 Aug 2012 14:29:26 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 1879 invoked by uid 500); 1 Aug 2012 14:29:26 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 1871 invoked by uid 99); 1 Aug 2012 14:29:26 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Aug 2012 14:29:26 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FSL_RCVD_USER,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of pipehappy@gmail.com designates 209.85.220.176 as permitted sender) Received: from [209.85.220.176] (HELO mail-vc0-f176.google.com) (209.85.220.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Aug 2012 14:29:18 +0000 Received: by vcbfl11 with SMTP id fl11so8373039vcb.35 for ; Wed, 01 Aug 2012 07:28:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject :content-type:content-transfer-encoding; bh=PYH3WoyQUHQPLVJMAl7cIyPbSSGx6XKMbDag/cUF/oE=; b=w7pJG0hHR8Z3Csva/BrCKecetUGoedVPC49bg9VVHKjTT31x/Nl/Tw3+tpFzu21Kyu XsyGQUrW4FKr+SZD2bmSgYA1HCI6ue6sxwpooJgcNY557D8cdSHtTu040DnAzNIsbvDV /pnCaBUMWbfIuFH104HC4jLdngI90crOaewhaJaObHEPUcxk3keTj+zF/DUgXB3VTMev ZFCR90tr4HN/vyUQyWUuFJsM0YtxlJjCUjaJv7kQCBskqVpoNKMxCWjFa63Be9g3c9UD vFgNIpZ66o4aGw0+65qsNfKveWyQ+pjqBed8WJGn/nNIQAN1jPWoYX4esFH0JVXgqr6x GDCQ== Received: by 10.59.7.193 with SMTP id de1mr5988730ved.34.1343831337764; Wed, 01 Aug 2012 07:28:57 -0700 (PDT) Received: from [10.10.10.216] ([206.217.112.50]) by mx.google.com with ESMTPS id cr4sm2872802vdb.7.2012.08.01.07.28.56 (version=SSLv3 cipher=OTHER); Wed, 01 Aug 2012 07:28:56 -0700 (PDT) Message-ID: <50193D27.6060503@gmail.com> Date: Wed, 01 Aug 2012 10:28:55 -0400 From: Yue Guan User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:14.0) Gecko/20120714 Thunderbird/14.0 MIME-Version: 1.0 To: user@hive.apache.org Subject: mapper is slower than hive' mapper Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi, there I'm writing mapreduce to replace some hive query and I find that my mapper is slow than hive's mapper. The Hive query is like: select sum(column1) from table group by column2, column3; My mapreduce program likes this: public static class HiveTableMapper extends Mapper { public void map(BytesWritable key, Text value, Context context) throws IOException, InterruptedException { String[] sLine = StringUtils.split(value.toString(), StringUtils.ESCAPE_CHAR, HIVE_FIELD_DELIMITER_CHAR); context.write(new MyKey(Integer.parseInt(sLine[0]), sLine[1]), new DoubleWritable(Double.parseDouble(sLine[2]))); } } I assume hive is doing something similar. Is there any trick in hive to speed this thing up? Thank you! Best,