crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Ortiz <dpo5...@gmail.com>
Subject Re: Materializing map of avros
Date Mon, 09 May 2016 16:26:26 GMT
Thanks.  That works.  I also found a workaround by serializing all the avro
records into JSON in the map function that reads the data in, then
deserializing back into avro in my processing function down the line.

Does ReadableData have issues running on a SparkPipeline?  Just curious
since it takes the org.apache.hadoop.mapreduce.TaskInputOutputContext in
its read method.

On Fri, May 6, 2016 at 4:56 PM Josh Wills <josh.wills@gmail.com> wrote:

> Try using the ReadableData version of the PTable- it's an object that is
> serializable and you can read the data from it into whatever you want in
> the initialize method of the DoFn you pass it to.
>
> On Fri, May 6, 2016 at 1:03 PM David Ortiz <dpo5003@gmail.com> wrote:
>
>> Hello,
>>
>>       In attempt to make my code a little bit easier to following, I am
>> attempting to materialize a PTable to a map and then pass it into another
>> DoFn.  Unfortunately, since the value is an Avro record, I am getting a
>> NotSerializableException out of the code when I try to use it.
>>
>>      I attempting to get around this by converting the record into a
>> ByteBuffer with the avro utils, but lo and behold that's also not
>> Serializable.  Since I do not see a convenient way to wrap a byte array
>> with crunch, has anyone had any luck with any other approaches to getting a
>> crunch-compatible serialized avro object?
>>
>> Thanks,
>>      David Ortiz
>>
>

Mime
View raw message