hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bennie Schut (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1138) Hive using lzo comporession returns unexpected results.
Date Mon, 08 Feb 2010 21:13:28 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831120#action_12831120
] 

Bennie Schut commented on HIVE-1138:
------------------------------------

on 0.20.1 lzo is removed. I installed the "hadoop-gpl-compression-read-only" code from googlecode.com
and it seems to work correctly on hadoop.

On the reduce step I see things like this in the logs:
{noformat} 
2010-02-08 22:06:36,554 INFO com.hadoop.compression.lzo.GPLNativeCodeLoader: Loaded native
gpl library
2010-02-08 22:06:36,555 INFO com.hadoop.compression.lzo.LzoCodec: Successfully loaded &
initialized native-lzo library
2010-02-08 22:06:36,556 INFO org.apache.hadoop.hive.ql.io.CodecPool: Got brand-new compressor
2010-02-08 22:06:36,558 INFO org.apache.hadoop.hive.ql.io.CodecPool: Got brand-new compressor
{noformat} 

2) I'll add some data+ example code tomorrow morning. 

Thanks for looking at this.

> Hive using lzo comporession returns unexpected results.
> -------------------------------------------------------
>
>                 Key: HIVE-1138
>                 URL: https://issues.apache.org/jira/browse/HIVE-1138
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.6.0
>         Environment: hadoop 0.20.1, hive trunk 2010-02-03
>            Reporter: Bennie Schut
>            Priority: Blocker
>
> I have a tab separated files I have loaded it with "load data inpath" then I do a
> SET hive.exec.compress.output=true;
> SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;
> SET mapred.map.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;
> select distinct login_cldr_id as cldr_id from chatsessions_load;
> Ended Job = job_201001151039_1641
> OK
> NULL
> NULL
> NULL
> Time taken: 49.06 seconds
> however if I start it without the set commands I get this:
> Ended Job = job_201001151039_1642
> OK
> 2283
> Time taken: 45.308 seconds
> Which is the correct result.
> When I do a "insert overwrite" on a rcfile table it will actually compress the data correctly.
> When I disable compression and query this new table the result is correct.
> When I enable compression it's wrong again.
> I see no errors in the logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message