Return-Path: Delivered-To: apmail-hadoop-hive-dev-archive@minotaur.apache.org Received: (qmail 1543 invoked from network); 22 Dec 2009 16:40:54 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 22 Dec 2009 16:40:54 -0000 Received: (qmail 82585 invoked by uid 500); 22 Dec 2009 16:40:52 -0000 Delivered-To: apmail-hadoop-hive-dev-archive@hadoop.apache.org Received: (qmail 82574 invoked by uid 500); 22 Dec 2009 16:40:52 -0000 Mailing-List: contact hive-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hive-dev@hadoop.apache.org Delivered-To: mailing list hive-dev@hadoop.apache.org Received: (qmail 82564 invoked by uid 99); 22 Dec 2009 16:40:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Dec 2009 16:40:52 +0000 X-ASF-Spam-Status: No, hits=-1999.6 required=10.0 tests=ALL_TRUSTED,SUBJECT_FUZZY_TION X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Dec 2009 16:40:50 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 7E9C4234C04C for ; Tue, 22 Dec 2009 08:40:29 -0800 (PST) Message-ID: <712227005.1261500029517.JavaMail.jira@brutus> Date: Tue, 22 Dec 2009 16:40:29 +0000 (UTC) From: "Dave Lerman (JIRA)" To: hive-dev@hadoop.apache.org Subject: [jira] Updated: (HIVE-1006) getPartitionDescFromPath failing from CombineHiveInputFormat In-Reply-To: <1636979212.1261494749414.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HIVE-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Lerman updated HIVE-1006: ------------------------------ Attachment: hive.1006.2.patch Sorry about that - upload the wrong patch for this and 1007. > getPartitionDescFromPath failing from CombineHiveInputFormat > ------------------------------------------------------------ > > Key: HIVE-1006 > URL: https://issues.apache.org/jira/browse/HIVE-1006 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.4.1 > Reporter: Dave Lerman > Attachments: hive.1006.1.patch, hive.1006.2.patch > > > When HiveInputFormat.getPartitionDescFromPath is called from CombineHiveInputFormat, it sometimes fails to return a matching partitionDesc which then causes an Exception down the line since the split doesn't have an inputFormatClassName. > The issue is that the path format used as the key in pathToPartitionInfo varies between stage - in the first stage it's the complete path as returned from the table definitions (eg. hdfs://server/path), and then in subsequent stages, it's the complete path with port (eg. hdfs://server:8020/path) of the result of the previous stage. This isn't a problem in HiveInputFormat since the directory you're looking up always uses the same format as the keys, but in CombineHiveInputFormat, we take that path and look up its children in the file system to get all the block information, and then use one of the returned paths to get the partition info -- and that returned path does not include the port. So, in any stage after the first, we are looking for a path without the port, but all the keys in the map contain a port, so we don't find a match. > The attached patch may not be ideal -- it doesn't fix the underlying problem of inconsistent path formats in pathToPartitionInfo -- it just works around it by walking through the map and looking for a matching path rather than doing a hash lookup. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.