Return-Path: Delivered-To: apmail-hadoop-pig-dev-archive@www.apache.org Received: (qmail 39772 invoked from network); 16 Sep 2009 13:16:23 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 16 Sep 2009 13:16:23 -0000 Received: (qmail 57511 invoked by uid 500); 16 Sep 2009 13:16:22 -0000 Delivered-To: apmail-hadoop-pig-dev-archive@hadoop.apache.org Received: (qmail 57493 invoked by uid 500); 16 Sep 2009 13:16:22 -0000 Mailing-List: contact pig-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: pig-dev@hadoop.apache.org Delivered-To: mailing list pig-dev@hadoop.apache.org Received: (qmail 57412 invoked by uid 99); 16 Sep 2009 13:16:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Sep 2009 13:16:22 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED,OBSCURED_EMAIL X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Sep 2009 13:16:18 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 7B620234C044 for ; Wed, 16 Sep 2009 06:15:57 -0700 (PDT) Message-ID: <962765043.1253106957504.JavaMail.jira@brutus> Date: Wed, 16 Sep 2009 06:15:57 -0700 (PDT) From: "Jeff Zhang (JIRA)" To: pig-dev@hadoop.apache.org Subject: [jira] Commented: (PIG-752) local mode doesn't read bzip2 and gzip compressed data files In-Reply-To: <1781877190.1238783472873.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/PIG-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756021#action_12756021 ] Jeff Zhang commented on PIG-752: -------------------------------- BTW, does anybody know what LocalDataStorage is used for ? It seems Pig will use HDataStorage even I run it in Local mode, so why do we need LocalDataStorage ? > local mode doesn't read bzip2 and gzip compressed data files > ------------------------------------------------------------ > > Key: PIG-752 > URL: https://issues.apache.org/jira/browse/PIG-752 > Project: Pig > Issue Type: Bug > Affects Versions: 0.4.0 > Reporter: David Ciemiewicz > Assignee: Jeff Zhang > Fix For: 0.4.0 > > Attachments: Pig_752.Patch > > > Problem 1) use of .bz2 file extension does not store results bzip2 compressed in Local mode (-exectype local) > If I use the .bz2 filename extension in a STORE statement on HDFS, the results are stored with bzip2 compression. > If I use the .bz2 filename extension in a STORE statement on local file system, the results are NOT stored with bzip2 compression. > compact.bz2.pig: > {code} > A = load 'events.test' using PigStorage(); > store A into 'events.test.bz2' using PigStorage(); > C = load 'events.test.bz2' using PigStorage(); > C = limit C 10; > dump C; > {code} > {code} > -bash-3.00$ pig -exectype local compact.bz2.pig > -bash-3.00$ file events.test > events.test: ASCII English text, with very long lines > -bash-3.00$ file events.test.bz2 > events.test.bz2: ASCII English text, with very long lines > -bash-3.00$ cat events.test | bzip2 > events.test.bz2 > -bash-3.00$ file events.test.bz2 > events.test.bz2: bzip2 compressed data, block size = 900k > {code} > The output format in local mode is definitely not bzip2, but it should be. > {code} > Problem 2) pig in local mode does not decompress bzip2 compressed files, but should, to be consistent with HDFS > read.bz2.pig: > {code} > A = load 'events.test.bz2' using PigStorage(); > A = limit A 10; > dump A; > {code} > The output should be human readable but is instead garbage, indicating no decompression took place during the load: > {code} > -bash-3.00$ pig -exectype local read.bz2.pig > USING: /grid/0/gs/pig/current > 2009-04-03 18:26:30,455 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! > 2009-04-03 18:26:30,456 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! > (BZh91AY&SYoz?u????@{????????????????????x_?d?|u????-??mK???;??????????????4?C??) > ((R? 6?*m?&???g, ?6?Zj?????k,???0?????QT?d???hY?#m????J?>????????[j???z?m?t?u?K)??K5+??)?m?E7j?X?8????????a?? > ??U?p@@????MT?$?B?P??N??=???(????z<}GK?E{@????c$\??I????]?G:?J) > a(R?,?U?V??????@?I@??J??!D?)???A?PP?IY??m? > (m????P(i?4,#F[?I)@????>??@??|7^?}U??w????wg,?u?$?T???????((Q!D?=`*?}h????????P??_|??=?(??2???m=?????xG?(?rC?B?(33??:4?N???????t????|??T?*??k??????NT?x???=?fyv?w>f??????4z???4t?) > (?oou?t???Kwl?????3?n????CM?WS?;l???P?s?x > a???e????)B??9? ?44 > ((??@4?) > (f????) > (?@+?d?0@>?U) > (Q?SR) > -bash-3.00$ > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.