hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mithun Radhakrishnan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-4788) RCFile and bzip2 compression not working
Date Tue, 09 Dec 2014 01:41:12 GMT

    [ https://issues.apache.org/jira/browse/HIVE-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238822#comment-14238822
] 

Mithun Radhakrishnan commented on HIVE-4788:
--------------------------------------------

@[~Navis]: Could you please clarify why this solves the problem? Wouldn't this have an effect
on data that's compressed using, say, GZip?

> RCFile and bzip2 compression not working
> ----------------------------------------
>
>                 Key: HIVE-4788
>                 URL: https://issues.apache.org/jira/browse/HIVE-4788
>             Project: Hive
>          Issue Type: Bug
>          Components: Compression
>    Affects Versions: 0.10.0
>         Environment: CDH4.2
>            Reporter: Johndee Burks
>            Assignee: Navis
>            Priority: Minor
>         Attachments: HIVE-4788.1.patch.txt, HIVE-4788.2.patch.txt
>
>
> The issue is that Bzip2 compressed rcfile data is encountering an error when being queried
even the most simple query "select *". The issue is easily reproducible using the following.

> Create a table and load the sample data below. 
> DDL: create table source_data (a string, b string) row format delimited fields terminated
by ',';
> Sample data: 
> apple,sauce 
> Test: 
> Do the following and you should receive the error listed below for the rcfile table with
bz2 compression. 
> create table rc_nobz2 (a string, b string) stored as rcfile; 
> insert into table rc_nobz2 select * from source_txt; 
> SET io.seqfile.compression.type=BLOCK; 
> SET hive.exec.compress.output=true; 
> SET mapred.compress.map.output=true; 
> SET mapred.output.compress=true; 
> SET mapred.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec; 
> create table rc_bz2 (a string, b string) stored as rcfile; 
> insert into table rc_bz2 select * from source_txt; 
> hive> select * from rc_bz2; 
> Failed with exception java.io.IOException:java.io.IOException: Stream is not BZip2 formatted:
expected 'h' as first byte but got '￿' 
> hive> select * from rc_nobz2; 
> apple	sauce



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message