crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: Reading Avro to GenericRecord
Date Tue, 28 Jan 2014 01:04:36 GMT
Patch is here: https://issues.apache.org/jira/browse/CRUNCH-333


On Mon, Jan 27, 2014 at 10:08 AM, Josh Wills <josh.wills@gmail.com> wrote:

> Of course. I wrote up a little patch that adds a method to From.java to
> open the Avro file and pull out the schema and return a Source of
> GenericData.Record, but I had to roll to some meetings before I got a
> chance to test it. I'll post something later this evening ET.
>  On Jan 27, 2014 11:56 AM, "Magnus Runesson" <magru@linuxalert.org> wrote:
>
>>  Thanks for quick answer.
>>
>> It is totally OK and reasonable to take one file in a directory and
>> assume all other has the same schema.
>>
>>
>> On 2014-01-27 18:27, Josh Wills wrote:
>>
>> No, I haven't written a way to do that yet, and I feel bad about it-- a
>> Clouderan asked me for just such a feature a couple of weeks ago and it
>> slipped my mind. I don't think it's hard to do, just a little tedious and
>> will require refreshing my memory of the Avro APIs. There's also the
>> potential issue that multiple Avro files in the same input directory can
>> have different schemas, so the one we would end up reading might be
>> somewhat arbitrary (e.g., based on the timestamp of the files in the
>> directory, or some such thing)-- is that ok?
>>
>>
>> On Mon, Jan 27, 2014 at 9:12 AM, Magnus Runesson <magru@linuxalert.org>wrote:
>>
>>> Can I in (s)crunch read an Avro-file to GenericRecord without provide
>>> the schema? I want crunch to get the schema from the avro-file it reads.
>>> How do I do it?
>>>
>>> /Magnus
>>>
>>
>>
>>


-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
View raw message