Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9D671DF3D for ; Sat, 1 Sep 2012 16:27:08 +0000 (UTC) Received: (qmail 4042 invoked by uid 500); 1 Sep 2012 16:27:08 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 3997 invoked by uid 500); 1 Sep 2012 16:27:08 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 3757 invoked by uid 500); 1 Sep 2012 16:27:07 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 3678 invoked by uid 99); 1 Sep 2012 16:27:07 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 01 Sep 2012 16:27:07 +0000 Date: Sun, 2 Sep 2012 03:27:07 +1100 (NCT) From: "Vitaliy Fuks (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: <2012486746.27286.1346516827879.JavaMail.jiratomcat@arcas> In-Reply-To: <115648728.52028.1313730507190.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Resolved] (HIVE-2395) Misleading "No LZO codec found, cannot run." exception when using external table and LZO / DeprecatedLzoTextInputFormat MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitaliy Fuks resolved HIVE-2395. -------------------------------- Resolution: Won't Fix Latest hadoop-lzo libraries do not exhibit this behavior. > Misleading "No LZO codec found, cannot run." exception when using external table and LZO / DeprecatedLzoTextInputFormat > ----------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-2395 > URL: https://issues.apache.org/jira/browse/HIVE-2395 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers > Affects Versions: 0.7.1 > Environment: Cloudera 3u1 with https://github.com/kevinweil/hadoop-lzo or https://github.com/kevinweil/elephant-bird > Reporter: Vitaliy Fuks > > We have a {{/tables/}} directory containing .lzo files with our data, compressed using lzop. > We {{CREATE EXTERNAL TABLE}} on top of this directory, using {{STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"}}. > .lzo files require that an LzoIndexer is run on them. When this is done, .lzo.index file is created for every .lzo file, so we end up with: > {noformat} > /tables/ourdata_2011-08-19.lzo > /tables/ourdata_2011-08-19.lzo.index > /tables/ourdata_2011-08-18.lzo > /tables/ourdata_2011-08-18.lzo.index > ..etc > {noformat} > The issue is that org.apache.hadoop.hive.ql.io.CombineHiveRecordReader is attempting to getRecordReader() for .lzo.index files. This throws a pretty confusing exception: > {noformat} > Caused by: java.io.IOException: No LZO codec found, cannot run. > at com.hadoop.mapred.DeprecatedLzoLineRecordReader.(DeprecatedLzoLineRecordReader.java:53) > at com.hadoop.mapred.DeprecatedLzoTextInputFormat.getRecordReader(DeprecatedLzoTextInputFormat.java:128) > at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:68) > {noformat} > More precisely, it dies on second invocation of getRecordReader() - here is some System.out.println() output: > {noformat} > DeprecatedLzoTextInputFormat.getRecordReader(): split=/tables/ourdata_2011-08-19.lzo:0+616479 > DeprecatedLzoTextInputFormat.getRecordReader(): split=/tables/ourdata_2011-08-19.lzo.index:0+64 > {noformat} > DeprecatedLzoTextInputFormat contains the following code which causes the ultimate exception and death of query, as it obviously doesn't have a codec to read .lzo.index files. > {noformat} > final CompressionCodec codec = codecFactory.getCodec(file); > if (codec == null) { > throw new IOException("No LZO codec found, cannot run."); > } > {noformat} > So I understand that the way things are right now is that Hive considers all files within a directory to be part of a table. There is an open patch HIVE-951 which would allow a quick workaround for this problem. > Does it make sense to add some hooks so that CombineHiveRecordReader or its parents are more aware of what files should be considered instead of blindly trying to read everything? > Any suggestions for a quick workaround to make it skip .index files? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira