incubator-crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: Writing Avro data to files
Date Thu, 13 Dec 2012 19:12:31 GMT
Ah-- that is interesting, and almost certainly the reason why we're writing
JSON instead of binary Avro.


On Thu, Dec 13, 2012 at 11:08 AM, Jonathan Natkins <natty@cloudera.com>wrote:

> It's 2.0.0 and 1.7.0. I've actually only been running MemPipelines thus
> far, to make sure that I've built the job correctly, so it's possible that
> that's the issue.
>
>
> On Thu, Dec 13, 2012 at 10:56 AM, Josh Wills <jwills@cloudera.com> wrote:
>
>> That surprises me-- Crunch has its own AvroOutputFormat in order to use
>> the mapreduce.* APIs, but they delegate much of the work to things like
>> DatumWriters/encoders/etc. from Avro's core libraries.
>>
>> Could I get some detail on hadoop/avro version? Is it just 1.0.x and Avro
>> 1.7.0?
>>
>> J
>>
>>
>> On Thu, Dec 13, 2012 at 10:35 AM, Jonathan Natkins <natty@cloudera.com>wrote:
>>
>>> Out of curiosity, is there a way to write output from a Crunch pipeline
>>> into an Avro-format file? It seems that if you do the
>>> collection.write(To.avroFile(path)), you end up just writing JSON. It can
>>> certainly be read into an Avro object, but it seems like it would be more
>>> efficient to write binary data to the file, so no parsing has to happen.
>>>
>>> Have I missed an API, or is this a missing feature?
>>>
>>> Thanks,
>>> Natty
>>>
>>
>>
>>
>> --
>> Director of Data Science
>> Cloudera <http://www.cloudera.com>
>> Twitter: @josh_wills <http://twitter.com/josh_wills>
>>
>>
>


-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
View raw message