incubator-crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Natkins <na...@cloudera.com>
Subject Re: Writing Avro data to files
Date Thu, 13 Dec 2012 19:15:15 GMT
Gotcha. Alright, I'll try a true MR pipeline, and see if that improves the
situtation. Thanks!


On Thu, Dec 13, 2012 at 11:12 AM, Josh Wills <jwills@cloudera.com> wrote:

> Ah-- that is interesting, and almost certainly the reason why we're
> writing JSON instead of binary Avro.
>
>
> On Thu, Dec 13, 2012 at 11:08 AM, Jonathan Natkins <natty@cloudera.com>wrote:
>
>> It's 2.0.0 and 1.7.0. I've actually only been running MemPipelines thus
>> far, to make sure that I've built the job correctly, so it's possible that
>> that's the issue.
>>
>>
>> On Thu, Dec 13, 2012 at 10:56 AM, Josh Wills <jwills@cloudera.com> wrote:
>>
>>> That surprises me-- Crunch has its own AvroOutputFormat in order to use
>>> the mapreduce.* APIs, but they delegate much of the work to things like
>>> DatumWriters/encoders/etc. from Avro's core libraries.
>>>
>>> Could I get some detail on hadoop/avro version? Is it just 1.0.x and
>>> Avro 1.7.0?
>>>
>>> J
>>>
>>>
>>> On Thu, Dec 13, 2012 at 10:35 AM, Jonathan Natkins <natty@cloudera.com>wrote:
>>>
>>>> Out of curiosity, is there a way to write output from a Crunch pipeline
>>>> into an Avro-format file? It seems that if you do the
>>>> collection.write(To.avroFile(path)), you end up just writing JSON. It can
>>>> certainly be read into an Avro object, but it seems like it would be more
>>>> efficient to write binary data to the file, so no parsing has to happen.
>>>>
>>>> Have I missed an API, or is this a missing feature?
>>>>
>>>> Thanks,
>>>> Natty
>>>>
>>>
>>>
>>>
>>> --
>>> Director of Data Science
>>> Cloudera <http://www.cloudera.com>
>>> Twitter: @josh_wills <http://twitter.com/josh_wills>
>>>
>>>
>>
>
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>
>

Mime
View raw message