Return-Path: X-Original-To: apmail-avro-user-archive@www.apache.org Delivered-To: apmail-avro-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B7B6C10751 for ; Sun, 13 Oct 2013 19:24:04 +0000 (UTC) Received: (qmail 97947 invoked by uid 500); 13 Oct 2013 19:24:02 -0000 Delivered-To: apmail-avro-user-archive@avro.apache.org Received: (qmail 97887 invoked by uid 500); 13 Oct 2013 19:24:02 -0000 Mailing-List: contact user-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@avro.apache.org Delivered-To: mailing list user@avro.apache.org Received: (qmail 97874 invoked by uid 99); 13 Oct 2013 19:24:00 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 13 Oct 2013 19:24:00 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of davidg@inner-active.com designates 213.199.154.11 as permitted sender) Received: from [213.199.154.11] (HELO emea01-am1-obe.outbound.protection.outlook.com) (213.199.154.11) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 13 Oct 2013 19:23:54 +0000 Received: from AMXPR07MB021.eurprd07.prod.outlook.com (10.242.67.141) by AMXPR07MB024.eurprd07.prod.outlook.com (10.242.67.155) with Microsoft SMTP Server (TLS) id 15.0.785.10; Sun, 13 Oct 2013 19:23:32 +0000 Received: from AMXPR07MB021.eurprd07.prod.outlook.com ([169.254.12.244]) by AMXPR07MB021.eurprd07.prod.outlook.com ([169.254.12.244]) with mapi id 15.00.0785.001; Sun, 13 Oct 2013 19:23:31 +0000 From: David Ginzburg To: "user@avro.apache.org" Subject: RE: Generating snappy compressed avro files as hadoop map reduce input files Thread-Topic: Generating snappy compressed avro files as hadoop map reduce input files Thread-Index: AQHOyCWkPQU1nYxcCkS+o8ucFmk7e5nyw6sAgAAEf22AAChXAIAAD8wt Date: Sun, 13 Oct 2013 19:23:31 +0000 Message-ID: References: , <9e54fcd2e3bf43f8997291fbdf084c57@AMXPR07MB021.eurprd07.prod.outlook.com>, In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [79.183.180.115] x-forefront-prvs: 0998671D02 x-forefront-antispam-report: SFV:NSPM;SFS:(377454003)(24454002)(189002)(199002)(164054003)(81342001)(69226001)(66066001)(54316002)(47446002)(74662001)(31966008)(81542001)(63696002)(15975445006)(77982001)(81686001)(15395725003)(16236675002)(56776001)(65816001)(79102001)(59766001)(74876001)(74316001)(81816001)(76576001)(76786001)(76796001)(4396001)(49866001)(47736001)(51856001)(53806001)(47976001)(50986001)(56816003)(80976001)(74706001)(15202345003)(74366001)(54356001)(77096001)(19580405001)(83322001)(19580395003)(33646001)(85306002)(83072001)(46102001)(24736002);DIR:OUT;SFP:;SCL:1;SRVR:AMXPR07MB024;H:AMXPR07MB021.eurprd07.prod.outlook.com;CLIP:79.183.180.115;FPR:;RD:InfoNoRecords;MX:1;A:1;LANG:en; Content-Type: multipart/alternative; boundary="_000_c636de4ade174648935d4b014fbddc9aAMXPR07MB021eurprd07pro_" MIME-Version: 1.0 X-OriginatorOrg: inner-active.com X-Virus-Checked: Checked by ClamAV on apache.org --_000_c636de4ade174648935d4b014fbddc9aAMXPR07MB021eurprd07pro_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Thanks, I am not generating the avro files with hadoop MR, but a different process. I Plan to just store the files on s3 for potential archive processing with = EMR. Can I use AvroSequenceFile from a non M/R process to generate the sequence = files having my avro records as the values, and null keys ? ________________________________ From: graham sanderson Sent: Sunday, October 13, 2013 9:16 PM To: user@avro.apache.org Subject: Re: Generating snappy compressed avro files as hadoop map reduce i= nput files If you're using hadoop, why not use AvroSequenceFileOutputFormat - this wor= ks fine with snappy (block level compression may be best depending on your = data) On Oct 13, 2013, at 10:58 AM, David Ginzburg > wrote: As mentioned in http://stackoverflow.com/a/15821136 Hadoop's snappy codec j= ust doesn't work with externally generated files. Can files generated by DataFileWriter serve as input files for a map reduce job, speciall= y EMR jobs ? ________________________________ From: Bertrand Dechoux > Sent: Sunday, October 13, 2013 6:36 PM To: user@avro.apache.org Subject: Re: Generating snappy compressed avro files as hadoop map reduce i= nput files I am not sure to understand the relation between your problem and the way t= he temporary data are stored after the map phase. However, I guess you are looking for a DataFileWriter and its setCodec func= tion. http://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileW= riter.html#setCodec%28org.apache.avro.file.CodecFactory%29 Regards Bertrand PS : A snappy-compressed avro file is not a standard file which has been co= mpressed afterwards but really a specific file containing compressed blocks= . This principle is similar to the SequenceFile's. Maybe that's what you me= an by different snappy codec? On Sun, Oct 13, 2013 at 5:16 PM, David Ginzburg > wrote: Hi, I am writing an application that produces avro record files , to be stored = on AWS S3 as possible input to EMR. I would like to compress with snappy codec before storing them on S3. It is my understanding that hadoop currently uses a different snappy codec,= mostly used as intermediate map output format . My question is how can I generate within my application logic (not MR) snap= py compressed avro files? --_000_c636de4ade174648935d4b014fbddc9aAMXPR07MB021eurprd07pro_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable
Thanks,
I am not generating the avro files with hadoop MR, but a different process.=
I Plan to just store the files on s3 for potential archive processing with = EMR.
Can I use AvroSequenceFile from a non M/R process to generate the sequence = files having my avro records as the values, and null keys ?

From: graham sanderson <= ;graham@vast.com>
Sent: Sunday, October 13, 2013 9:16 PM
To: user@avro.apache.org
Subject: Re: Generating snappy compressed avro files as hadoop map r= educe input files
 
If you're using hadoop, why not use AvroSequenceFileOutputFormat - thi= s works fine with snappy (block level compression may be best depending on = your data)

On Oct 13, 2013, at 10:58 AM, David Ginzburg <davidg@inner-active.com> wrote:

As mentioned in http://stackoverflow.com/a/1582113= 6 Hadoop's snappy cod= ec just doesn't work with externally generated files.

Can files generated by DataFi= leWriter  serve as input files for a map reduce job, specially EMR jobs ?=  

From:=  Bertrand Dechoux <dec= houxb@gmail.com>
Sent: Sunday, Octo= ber 13, 2013 6:36 PM
To: user@avro.apache.org
Subject: Re: Gener= ating snappy compressed avro files as hadoop map reduce input files
 
I am not sure to understand the relation between your problem and the = way the temporary data are stored after the map phase.

However, I guess you are looking for a DataFileWriter and its setCodec func= tion.
http= ://avro.apache.org/docs/current/api/java/org/apache/avro/file/DataFileWrite= r.html#setCodec%28org.apache.avro.file.CodecFactory%29

Regards

Bertrand

PS : A snappy-compressed avro file is not a stan= dard file which has been compressed afterwards but really a specific file c= ontaining compressed blocks. This principle is similar to the SequenceFile'= s. Maybe that's what you mean by different snappy codec?

On Sun, Oct 13, 2013 at 5:16 PM, David Ginzburg<= span class=3D"Apple-converted-space"> <davidg@inner-acti= ve.com> wro= te:
Hi,

I am writing an application that produces avro record files , to be stored = on AWS S3 as possible input to EMR.
I would like to compress with snappy codec before storing them on S3.
It is my understanding that hadoop currently uses a different snappy codec,= mostly used as intermediate map output format .
My question is how can I generate within my application logic (not MR) snap= py compressed avro files?






--_000_c636de4ade174648935d4b014fbddc9aAMXPR07MB021eurprd07pro_--