hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Zhang (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-752) local mode doesn't read bzip2 and gzip compressed data files
Date Wed, 16 Sep 2009 13:13:57 GMT

     [ https://issues.apache.org/jira/browse/PIG-752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jeff Zhang updated PIG-752:
---------------------------

    Attachment: Pig_752.Patch

Attach the patch.  Delegate the IOStream creation to IOStreamFactory.

The reason of this bug is because Pig use FileLocalizer to handle IO in local mode, and use
PigSlice and PigRecordWriter to handle IO in mapreduce mode.  So it should be careful to make
them behave consistent.





> local mode doesn't read bzip2 and gzip compressed data files
> ------------------------------------------------------------
>
>                 Key: PIG-752
>                 URL: https://issues.apache.org/jira/browse/PIG-752
>             Project: Pig
>          Issue Type: Bug
>            Reporter: David Ciemiewicz
>            Assignee: Jeff Zhang
>         Attachments: Pig_752.Patch
>
>
> Problem 1)  use of .bz2 file extension does not store results bzip2 compressed in Local
mode (-exectype local)
> If I use the .bz2 filename extension in a STORE statement on HDFS, the results are stored
with bzip2 compression.
> If I use the .bz2 filename extension in a STORE statement on local file system, the results
are NOT stored with bzip2 compression.
> compact.bz2.pig:
> {code}
> A = load 'events.test' using PigStorage();
> store A into 'events.test.bz2' using PigStorage();
> C = load 'events.test.bz2' using PigStorage();
> C = limit C 10;
> dump C;
> {code}
> {code}
> -bash-3.00$ pig -exectype local compact.bz2.pig
> -bash-3.00$ file events.test
> events.test: ASCII English text, with very long lines
> -bash-3.00$ file events.test.bz2
> events.test.bz2: ASCII English text, with very long lines
> -bash-3.00$ cat events.test | bzip2 > events.test.bz2
> -bash-3.00$ file events.test.bz2
> events.test.bz2: bzip2 compressed data, block size = 900k
> {code}
> The output format in local mode is definitely not bzip2, but it should be.
> {code}
> Problem 2) pig in local mode does not decompress bzip2 compressed files, but should,
to be consistent with HDFS
> read.bz2.pig:
> {code}
> A = load 'events.test.bz2' using PigStorage();
> A = limit A 10;
> dump A;
> {code}
> The output should be human readable but is instead garbage, indicating no decompression
took place during the load:
> {code}
> -bash-3.00$ pig -exectype local read.bz2.pig
> USING: /grid/0/gs/pig/current
> 2009-04-03 18:26:30,455 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher
- 100% complete!
> 2009-04-03 18:26:30,456 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher
- Success!!
> (BZh91AY&SYoz?u????@{????????????????????x_?d?|u????-??mK???;??????????????4?C??)
> ((R? 6?*m?&???g, ?6?Zj?????k,???0?????QT?d???hY?#m????J?>????????[j???z?m?t?u?K)??K5+??)?m?E7j?X?8????????a??
> ??U?p@@????MT?$?B?P??N??=???(????z<}GK?E{@????c$\??I????]?G:?J)
> a(R?,?U?V??????@?I@??J??!D?)???A?PP?IY??m?
> (m????P(i?4,#F[?I)@????>??@??|7^?}U??w????wg,?u?$?T???????((Q!D?=`*?}h????????P??_|??=?(??2???m=?????xG?(?rC?B?(33??:4?N???????t????|??T?*??k??????NT?x???=?fyv?w>f??????4z???4t?)
> (?oou?t???Kwl?????3?n????CM?WS?;l???P?s?x
> a???e????)B??9?                          ?44
> ((??@4?)
> (f????)
> (?@+?d?0@>?U)
> (Q?SR)
> -bash-3.00$ 
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message