Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7E60710329 for ; Tue, 15 Oct 2013 15:32:54 +0000 (UTC) Received: (qmail 69803 invoked by uid 500); 15 Oct 2013 15:32:47 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 69721 invoked by uid 500); 15 Oct 2013 15:32:47 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 69548 invoked by uid 500); 15 Oct 2013 15:32:46 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 69296 invoked by uid 99); 15 Oct 2013 15:32:45 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Oct 2013 15:32:45 +0000 Date: Tue, 15 Oct 2013 15:32:45 +0000 (UTC) From: "Yin Huai (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-5546) A change in ORCInputFormat made by HIVE-4113 was reverted by HIVE-5391 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-5546: --------------------------- Status: Patch Available (was: Open) [~sershe] [~ashutoshc] Can you take a look? > A change in ORCInputFormat made by HIVE-4113 was reverted by HIVE-5391 > ---------------------------------------------------------------------- > > Key: HIVE-5546 > URL: https://issues.apache.org/jira/browse/HIVE-5546 > Project: Hive > Issue Type: Bug > Reporter: Yin Huai > Assignee: Yin Huai > Attachments: HIVE-5546.1.patch > > > {code} > 2013-10-15 10:49:49,386 INFO org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: included column ids = > 2013-10-15 10:49:49,386 INFO org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: included columns names = > 2013-10-15 10:49:49,386 INFO org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: No ORC pushdown predicate > 2013-10-15 10:49:49,834 INFO org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://localhost:54310/user/hive/warehouse/web_sales_orc/000000_0 > 2013-10-15 10:49:49,834 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1 > 2013-10-15 10:49:49,840 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 100 > 2013-10-15 10:49:49,968 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1 > 2013-10-15 10:49:49,994 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds. > 2013-10-15 10:49:49,994 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName yhuai for UID 1000 from the native implementation > 2013-10-15 10:49:49,996 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space > at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:949) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > {code} > If includedColumnIds is an empty list, we do not need to read any column. But, right now, in OrcInputFormat.findIncludedColumns, we have ... > {code} > if (ColumnProjectionUtils.isReadAllColumns(conf) || > includedStr == null || includedStr.trim().length() == 0) { > return null; > } > {code} > If includedStr is an empty string, the code assumes that we need all columns, which is not correct. -- This message was sent by Atlassian JIRA (v6.1#6144)