Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A093A10998 for ; Thu, 10 Oct 2013 08:16:54 +0000 (UTC) Received: (qmail 23475 invoked by uid 500); 10 Oct 2013 08:16:44 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 23413 invoked by uid 500); 10 Oct 2013 08:16:43 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 23371 invoked by uid 99); 10 Oct 2013 08:16:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Oct 2013 08:16:41 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of gourav.hadoop@gmail.com designates 209.85.215.49 as permitted sender) Received: from [209.85.215.49] (HELO mail-la0-f49.google.com) (209.85.215.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Oct 2013 08:16:35 +0000 Received: by mail-la0-f49.google.com with SMTP id ev20so1717374lab.36 for ; Thu, 10 Oct 2013 01:16:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=1C85Rma9nbERpywgxt5CjKSkrnLJSWhapwzTDENe6Rg=; b=zT/Sru2bhaOKIe0erTaWIPWQ4SbKaLJCFjPIm7V1z7vmMjRKAZBXF/DlikAyI27/3L BChx+x7ZrrKywvJ8D6HS+Ph01U6Np2pOlTY6MLD5eAe0wE3WlTbTSMhDn8UaAaxp/yHq PrtMUTl5gAhVCB6nd1KJfEnkg9BsES7ebK9kVP6tiv1iv1aTxSVCxUXQAMxoC5cH70qH IoBIivhDC7t4AEHQgFi3394PjK6wpz/0WSQIhnEGNqu+yBMyObEG8G2YuPc9kgSM/EDs G4u72fv8TXMf5hevh9u0So4Q8Io4a9JqNxX4lrvwMmDU3IW4cxv1ELm20BbuIOlpl1+9 16kQ== MIME-Version: 1.0 X-Received: by 10.112.155.230 with SMTP id vz6mr804713lbb.35.1381392974603; Thu, 10 Oct 2013 01:16:14 -0700 (PDT) Received: by 10.114.82.135 with HTTP; Thu, 10 Oct 2013 01:16:14 -0700 (PDT) In-Reply-To: <3C804F9A-FBB9-4E95-BEB2-FA525D8444C7@hortonworks.com> References: <3C804F9A-FBB9-4E95-BEB2-FA525D8444C7@hortonworks.com> Date: Thu, 10 Oct 2013 09:16:14 +0100 Message-ID: Subject: Re: Single Mapper - HIVE 0.11 From: Gourav Sengupta To: dev Content-Type: multipart/alternative; boundary=089e011770cb8be8c904e85e9d70 X-Virus-Checked: Checked by ClamAV on apache.org --089e011770cb8be8c904e85e9d70 Content-Type: text/plain; charset=ISO-8859-1 Hi, The entire table of 34 million records is in a single ORC file. and its around 7 GB in size. the other ORC file is a dimension table with less than 40 MB of records once again in a single ORC file. I do not remember setting anywhere ORC file stripe size. The problem that I am facing is the query is triggering only a single mapper though the cluster has three nodes. Unlike other posts here I need more mappers. The other mentioned properties are mentioned below from the job xml file: mapred.min.split.size.per.node1 and mapred.max.split.size256000000 I am sure that there is no issue with HADOOP configuration as with some other queries I am getting more than 24 mappers. Please accept my sincere regards for your kind help and insights. Thanks, Gourav Sengupta On Wed, Oct 9, 2013 at 6:22 PM, Prasanth Jayachandran < pjayachandran@hortonworks.com> wrote: > What is your ORC file stripe size? How many ORC files are there in each of > the tables? It could be possible that ORC compressed the file so much that > the file size is less than the HDFS block size. Can you please report the > file size of the two ORC files? > > Another possibility is that there are many small files. In that case by > default hive uses CombineHiveInputFormat which combines many small files > into a single large file. Hence you will see less number of mappers. If you > are expecting one mapper per hdfs file, then try disabling > CombineHiveInputFormat by "set > hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;". Another > way to control the number of mappers is by adjusting the min and max split > size. > > Thanks > Prasanth Jayachandran > > On Oct 9, 2013, at 10:03 AM, Nitin Pawar wrote: > > > whats the size of the table? (in GBs? ) > > > > Whats the max and min split sizes have you provied? > > > > > > On Wed, Oct 9, 2013 at 10:28 PM, Gourav Sengupta < > gourav.hadoop@gmail.com>wrote: > > > >> Hi, > >> > >> I am trying to run a join using two tables stored in ORC file format. > >> > >> The first table has 34 million records and the second has around 300,000 > >> records. > >> > >> Setting "set hive.auto.convert.join=true" makes the entire query run > via a > >> single mapper. > >> In case I am setting "set hive.auto.convert.join=false" then there are > two > >> mappers first one reads the second table and then the entire large table > >> goes through the second mapper. > >> > >> Is there something that I am doing wrong because there are three nodes > in > >> the HADOOP cluster currently and I was expecting that at least 6 mappers > >> should have been used. > >> > >> Thanks and Regards, > >> Gourav > >> > > > > > > > > -- > > Nitin Pawar > > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. > --089e011770cb8be8c904e85e9d70--