hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vitaliy Fuks (JIRA)" <>
Subject [jira] [Resolved] (HIVE-2395) Misleading "No LZO codec found, cannot run." exception when using external table and LZO / DeprecatedLzoTextInputFormat
Date Sat, 01 Sep 2012 16:27:07 GMT


Vitaliy Fuks resolved HIVE-2395.

    Resolution: Won't Fix

Latest hadoop-lzo libraries do not exhibit this behavior.
> Misleading "No LZO codec found, cannot run." exception when using external table and
LZO / DeprecatedLzoTextInputFormat
> -----------------------------------------------------------------------------------------------------------------------
>                 Key: HIVE-2395
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.7.1
>         Environment: Cloudera 3u1 with or
>            Reporter: Vitaliy Fuks
> We have a {{/tables/}} directory containing .lzo files with our data, compressed using
> We {{CREATE EXTERNAL TABLE}} on top of this directory, using {{STORED AS INPUTFORMAT
> .lzo files require that an LzoIndexer is run on them. When this is done, .lzo.index file
is created for every .lzo file, so we end up with:
> {noformat}
> /tables/ourdata_2011-08-19.lzo
> /tables/ourdata_2011-08-19.lzo.index
> /tables/ourdata_2011-08-18.lzo
> /tables/ourdata_2011-08-18.lzo.index
> ..etc
> {noformat}
> The issue is that is attempting
to getRecordReader() for .lzo.index files. This throws a pretty confusing exception:
> {noformat}
> Caused by: No LZO codec found, cannot run.
>         at com.hadoop.mapred.DeprecatedLzoLineRecordReader.<init>(
>         at com.hadoop.mapred.DeprecatedLzoTextInputFormat.getRecordReader(
>         at<init>(
> {noformat}
> More precisely, it dies on second invocation of getRecordReader() - here is some System.out.println()
> {noformat}
> DeprecatedLzoTextInputFormat.getRecordReader(): split=/tables/ourdata_2011-08-19.lzo:0+616479
> DeprecatedLzoTextInputFormat.getRecordReader(): split=/tables/ourdata_2011-08-19.lzo.index:0+64
> {noformat}
> DeprecatedLzoTextInputFormat contains the following code which causes the ultimate exception
and death of query, as it obviously doesn't have a codec to read .lzo.index files.
> {noformat}
>     final CompressionCodec codec = codecFactory.getCodec(file);
>     if (codec == null) {
>       throw new IOException("No LZO codec found, cannot run.");
>     }
> {noformat}
> So I understand that the way things are right now is that Hive considers all files within
a directory to be part of a table. There is an open patch HIVE-951 which would allow a quick
workaround for this problem.
> Does it make sense to add some hooks so that CombineHiveRecordReader or its parents are
more aware of what files should be considered instead of blindly trying to read everything?
> Any suggestions for a quick workaround to make it skip .index files?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message