hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bennie Schut (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-3308) Mixing avro and snappy gives null values
Date Mon, 25 Mar 2013 09:07:16 GMT

    [ https://issues.apache.org/jira/browse/HIVE-3308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13612502#comment-13612502
] 

Bennie Schut commented on HIVE-3308:
------------------------------------

I would really appreciate someone committing this. It has tests showing the issue with correct
results after the patch. It makes the serde more consistent with other serdes. Basically anyone
using compression combined with avro will hit this bug like we see with HIVE-4195.
                
> Mixing avro and snappy gives null values
> ----------------------------------------
>
>                 Key: HIVE-3308
>                 URL: https://issues.apache.org/jira/browse/HIVE-3308
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.10.0
>            Reporter: Bennie Schut
>            Assignee: Bennie Schut
>         Attachments: HIVE-3308.patch1.txt, HIVE-3308.patch2.txt
>
>
> On default hive uses LazySimpleSerDe for output.
> When I now enable compression and "select count(*) from avrotable" the output is a file
with the .avro extension but this then will display null values since the file is in reality
not an avro file but a file created by LazySimpleSerDe using compression so should be a .snappy
file.
> This causes any job (exception select * from avrotable is that not truly a job) to show
null values.
> If you use any serde other then avro you can temporarily fix this by setting "set hive.output.file.extension=.snappy"
and it will correctly work again but this won't work on avro since it overwrites the hive.output.file.extension
during initializing.
> When you dump the query result into a table with "create table bla as" you can rename
the .avro file into .snappy and the "select from bla" will also magiacally work again.
> Input and Ouput serdes don't always match so when I use avro as an input format it should
not set the hive.output.file.extension.
> Onces it's set all queries will use it and fail making the connection useless to reuse.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message