hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Ciemiewicz (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-752) local mode doesn't read bzip2 and gzip compressed data files
Date Fri, 22 Jan 2010 18:09:21 GMT

    [ https://issues.apache.org/jira/browse/PIG-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803795#action_12803795

David Ciemiewicz commented on PIG-752:


What do you mean when you say "local mode has been removed"?

Does this mean that the option "-exectype local" has been removed?
Or does this mean that the local mode execution code has been replaced or will be replaced
by a M/R execution engine that operates on the users local computer without the need for an
HDFS grid.

If the former (no local exection), this is nuts.
If the latter (M/R execution for local execution), and this will supply the means of doing
bzip compression reading and writing, then this isn't a WON'T FIX, this is a "FIXED" by change
in execution engine?

So which is it?

> local mode doesn't read bzip2 and gzip compressed data files
> ------------------------------------------------------------
>                 Key: PIG-752
>                 URL: https://issues.apache.org/jira/browse/PIG-752
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>            Reporter: David Ciemiewicz
>            Assignee: Jeff Zhang
>         Attachments: Pig_752.Patch
> Problem 1)  use of .bz2 file extension does not store results bzip2 compressed in Local
mode (-exectype local)
> If I use the .bz2 filename extension in a STORE statement on HDFS, the results are stored
with bzip2 compression.
> If I use the .bz2 filename extension in a STORE statement on local file system, the results
are NOT stored with bzip2 compression.
> compact.bz2.pig:
> {code}
> A = load 'events.test' using PigStorage();
> store A into 'events.test.bz2' using PigStorage();
> C = load 'events.test.bz2' using PigStorage();
> C = limit C 10;
> dump C;
> {code}
> {code}
> -bash-3.00$ pig -exectype local compact.bz2.pig
> -bash-3.00$ file events.test
> events.test: ASCII English text, with very long lines
> -bash-3.00$ file events.test.bz2
> events.test.bz2: ASCII English text, with very long lines
> -bash-3.00$ cat events.test | bzip2 > events.test.bz2
> -bash-3.00$ file events.test.bz2
> events.test.bz2: bzip2 compressed data, block size = 900k
> {code}
> The output format in local mode is definitely not bzip2, but it should be.
> {code}
> Problem 2) pig in local mode does not decompress bzip2 compressed files, but should,
to be consistent with HDFS
> read.bz2.pig:
> {code}
> A = load 'events.test.bz2' using PigStorage();
> A = limit A 10;
> dump A;
> {code}
> The output should be human readable but is instead garbage, indicating no decompression
took place during the load:
> {code}
> -bash-3.00$ pig -exectype local read.bz2.pig
> USING: /grid/0/gs/pig/current
> 2009-04-03 18:26:30,455 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher
- 100% complete!
> 2009-04-03 18:26:30,456 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher
- Success!!
> (BZh91AY&SYoz?u????@{????????????????????x_?d?|u????-??mK???;??????????????4?C??)
> ((R? 6?*m?&???g, ?6?Zj?????k,???0?????QT?d???hY?#m????J?>????????[j???z?m?t?u?K)??K5+??)?m?E7j?X?8????????a??
> ??U?p@@????MT?$?B?P??N??=???(????z<}GK?E{@????c$\??I????]?G:?J)
> a(R?,?U?V??????@?I@??J??!D?)???A?PP?IY??m?
> (m????P(i?4,#F[?I)@????>??@??|7^?}U??w????wg,?u?$?T???????((Q!D?=`*?}h????????P??_|??=?(??2???m=?????xG?(?rC?B?(33??:4?N???????t????|??T?*??k??????NT?x???=?fyv?w>f??????4z???4t?)
> (?oou?t???Kwl?????3?n????CM?WS?;l???P?s?x
> a???e????)B??9?                          ?44
> ((??@4?)
> (f????)
> (?@+?d?0@>?U)
> (Q?SR)
> -bash-3.00$ 
> {code}

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message