Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 058A494E3 for ; Wed, 22 May 2013 15:32:46 +0000 (UTC) Received: (qmail 5448 invoked by uid 500); 22 May 2013 15:32:41 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 4933 invoked by uid 500); 22 May 2013 15:32:41 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 4920 invoked by uid 99); 22 May 2013 15:32:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 May 2013 15:32:40 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of robert.rapplean@trueffect.com designates 216.32.181.183 as permitted sender) Received: from [216.32.181.183] (HELO ch1outboundpool.messaging.microsoft.com) (216.32.181.183) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 May 2013 15:32:35 +0000 Received: from mail22-ch1-R.bigfish.com (10.43.68.241) by CH1EHSOBE007.bigfish.com (10.43.70.57) with Microsoft SMTP Server id 14.1.225.23; Wed, 22 May 2013 15:32:12 +0000 Received: from mail22-ch1 (localhost [127.0.0.1]) by mail22-ch1-R.bigfish.com (Postfix) with ESMTP id 17F47C0228 for ; Wed, 22 May 2013 15:32:12 +0000 (UTC) X-Forefront-Antispam-Report: CIP:157.56.245.85;KIP:(null);UIP:(null);IPV:NLI;H:CH1PRD0811HT003.namprd08.prod.outlook.com;RD:none;EFVD:NLI X-SpamScore: -4 X-BigFish: PS-4(zz98dI9371Ic85fh181fM103dKdd85k9a6kzz1f42h1ee6h1de0h1fdah1202h1e76h1d1ah1d2ah1fc6hzz17326ah18c673h1954cbh18602eh8275bh8275dhz2fh2a8h668h839hd25hf0ah1288h12a5h12bdh137ah1441h1504h1537h153bh15d0h162dh1631h1758h18e1h1946h19b5h19ceh1ad9h1b0ah1bceh1d07h1d0ch1d2eh1d3fh1155h) Received-SPF: pass (mail22-ch1: domain of trueffect.com designates 157.56.245.85 as permitted sender) client-ip=157.56.245.85; envelope-from=robert.rapplean@trueffect.com; helo=CH1PRD0811HT003.namprd08.prod.outlook.com ;.outlook.com ; Received: from mail22-ch1 (localhost.localdomain [127.0.0.1]) by mail22-ch1 (MessageSwitch) id 1369236729260626_13499; Wed, 22 May 2013 15:32:09 +0000 (UTC) Received: from CH1EHSMHS035.bigfish.com (snatpool1.int.messaging.microsoft.com [10.43.68.243]) by mail22-ch1.bigfish.com (Postfix) with ESMTP id 3D4DD4C0062 for ; Wed, 22 May 2013 15:32:09 +0000 (UTC) Received: from CH1PRD0811HT003.namprd08.prod.outlook.com (157.56.245.85) by CH1EHSMHS035.bigfish.com (10.43.70.35) with Microsoft SMTP Server (TLS) id 14.1.225.23; Wed, 22 May 2013 15:32:04 +0000 Received: from CH1PRD0811MB430.namprd08.prod.outlook.com ([169.254.8.202]) by CH1PRD0811HT003.namprd08.prod.outlook.com ([10.255.155.38]) with mapi id 14.16.0311.000; Wed, 22 May 2013 15:32:04 +0000 From: Robert Rapplean To: "user@hadoop.apache.org" Subject: RE: Viewing snappy compressed files Thread-Topic: Viewing snappy compressed files Thread-Index: Ac5WMdlKXIX1AZB4QPGRt9j1VAi1JgAEbMwA//+cloD//iGh8A== Date: Wed, 22 May 2013 15:32:03 +0000 Message-ID: <9240AD66F4BBBA4088ACAAC15099237B1EB57C30@CH1PRD0811MB430.namprd08.prod.outlook.com> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [70.96.5.222] Content-Type: multipart/alternative; boundary="_000_9240AD66F4BBBA4088ACAAC15099237B1EB57C30CH1PRD0811MB430_" MIME-Version: 1.0 X-OriginatorOrg: trueffect.com X-Virus-Checked: Checked by ClamAV on apache.org --_000_9240AD66F4BBBA4088ACAAC15099237B1EB57C30CH1PRD0811MB430_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Thanks! This shortcuts my current process considerably, and should take the= pressure off for the short term. I'd still like to be able to analyze the = data in a python script without having to make a local copy, but that can w= ait. Best, Robert Rapplean Senior Software Engineer 303-872-2256 direct | 303.438.9597 main | www.trueffect.com From: Sanjay Subramanian [mailto:Sanjay.Subramanian@wizecommerce.com] Sent: Tuesday, May 21, 2013 11:56 AM To: user@hadoop.apache.org Subject: Re: Viewing snappy compressed files +1 Thanks Rahul-da Or u can use hdfs dfs -text /path/to/dir/on/hdfs/part-r-00000.snappy | less From: Rahul Bhattacharjee > Reply-To: "user@hadoop.apache.org" > Date: Tuesday, May 21, 2013 9:52 AM To: "user@hadoop.apache.org" > Subject: Re: Viewing snappy compressed files I haven't tried this with snappy , but you can try using hadoop fs -text On Tue, May 21, 2013 at 8:28 PM, Robert Rapplean > wrote: Hey, there. My Google skills have failed me, and I hope someone here can po= int me in the right direction. We're storing data on our Hadoop cluster in Snappy compressed format. When = we pull a raw file down and try to read it, however, the Snappy libraries d= on't know how to read the files. They tell me that the stream is missing th= e snappy identifier. I tried inserting 0xff 0x06 0x00 0x00 0x73 0x4e 0x61 0= x50 0x70 0x59 into the beginning of the file, but that didn't do it. Can someone point me to resources for figuring out how to uncompress these = files without going through Hadoop? ________________________________________ Robert Rapplean Senior Software Engineer 303-872-2256 direct | 303.438.9597 m= ain | www.trueffect.com CONFIDENTIALITY NOTICE =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D This email message and any attachments are for the exclusive use of the int= ended recipient(s) and may contain confidential and privileged information.= Any unauthorized review, use, disclosure or distribution is prohibited. If= you are not the intended recipient, please contact the sender by reply ema= il and destroy all copies of the original message along with any attachment= s, from your computer system. If you are the intended recipient, please be = advised that the content of this message is subject to access, review and d= isclosure by the sender's Email System Administrator. --_000_9240AD66F4BBBA4088ACAAC15099237B1EB57C30CH1PRD0811MB430_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Thanks! This shortcuts my= current process considerably, and should take the pressure off for the sho= rt term. I’d still like to be able to analyze the data in a python script without having to make a local copy, but that can wait.

 <= /p>

Best,

 <= /p>

Robert Rapplean

Senior Software Engineer<= o:p>

303-872-2256  direct=   | 303.438.9597  main | www.trueffect.com

 <= /p>

From: Sanjay S= ubramanian [mailto:Sanjay.Subramanian@wizecommerce.com]
Sent: Tuesday, May 21, 2013 11:56 AM
To: user@hadoop.apache.org
Subject: Re: Viewing snappy compressed files

 

+1 Thanks Rahul-da=

 

Or u can use 

hdfs dfs -text /path/to/dir= /on/hdfs/part-r-00000.snappy | less 

 

 

From: Rahul Bhattacharjee <rahul.rec.dgp@gmail.com>
Reply-To: "user@hadoo= p.apache.org" <user@h= adoop.apache.org>
Date: Tuesday, May 21, 2013 9:52 AM
To: "user@hadoop.apac= he.org" <user@hadoop.= apache.org>
Subject: Re: Viewing snappy compressed files

 

I haven't tried this with snappy , but you can= try using hadoop fs -text <path>

 

On Tue, May 21, 2013 at 8:2= 8 PM, Robert Rapplean <robert.rapplean@trueffect.com> wrote:

Hey, there. My Google skills have fail= ed me, and I hope someone here can point me in the right direction.

 

We’re storing data on our Hadoop clu=
ster in Snappy compressed format. When we pull a raw file down and try to r=
ead it, however, the Snappy libraries don’t know how to read the file=
s. They tell me that the stream is missing the snappy identifier. I tried i=
nserting 0xff 0x06 0x00 0x00 0x73 0x4e 0x61 0x50 0x70 0x59 into the beginni=
ng of the file, but that didn’t do it.

Can someone point me to= resources for figuring out how to uncompress these files without going thr= ough Hadoop?

 

______________________________________= __

Robert Rapplean

Senior Software Engineer

303-872-2256  direct  | 303.438.9597  main = | www.trueffect.com

 

 

 

C= ONFIDENTIALITY NOTICE
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
This email message and any attachments are for the exclusive use of the int= ended recipient(s) and may contain confidential and privileged information.= Any unauthorized review, use, disclosure or distribution is prohibited. If= you are not the intended recipient, please contact the sender by reply email and destroy all copies of the ori= ginal message along with any attachments, from your computer system. If you= are the intended recipient, please be advised that the content of this mes= sage is subject to access, review and disclosure by the sender's Email System Administrator.

--_000_9240AD66F4BBBA4088ACAAC15099237B1EB57C30CH1PRD0811MB430_--