crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <>
Subject Re: Reading Avro to GenericRecord
Date Mon, 27 Jan 2014 18:08:19 GMT
Of course. I wrote up a little patch that adds a method to to
open the Avro file and pull out the schema and return a Source of
GenericData.Record, but I had to roll to some meetings before I got a
chance to test it. I'll post something later this evening ET.
On Jan 27, 2014 11:56 AM, "Magnus Runesson" <> wrote:

>  Thanks for quick answer.
> It is totally OK and reasonable to take one file in a directory and assume
> all other has the same schema.
> On 2014-01-27 18:27, Josh Wills wrote:
> No, I haven't written a way to do that yet, and I feel bad about it-- a
> Clouderan asked me for just such a feature a couple of weeks ago and it
> slipped my mind. I don't think it's hard to do, just a little tedious and
> will require refreshing my memory of the Avro APIs. There's also the
> potential issue that multiple Avro files in the same input directory can
> have different schemas, so the one we would end up reading might be
> somewhat arbitrary (e.g., based on the timestamp of the files in the
> directory, or some such thing)-- is that ok?
> On Mon, Jan 27, 2014 at 9:12 AM, Magnus Runesson <>wrote:
>> Can I in (s)crunch read an Avro-file to GenericRecord without provide the
>> schema? I want crunch to get the schema from the avro-file it reads. How do
>> I do it?
>> /Magnus

View raw message