crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <josh.wi...@gmail.com>
Subject Re: Materializing map of avros
Date Mon, 09 May 2016 16:31:09 GMT
I think it works in SparkPipeline-- I have hacks in place to fake a
TIOContext inside of Spark when it's needed, but it's possible we need to
add implementation of more methods to get it to work w/all of the
ReadableData impls.

On Mon, May 9, 2016 at 9:26 AM, David Ortiz <dpo5003@gmail.com> wrote:

> Thanks.  That works.  I also found a workaround by serializing all the
> avro records into JSON in the map function that reads the data in, then
> deserializing back into avro in my processing function down the line.
>
> Does ReadableData have issues running on a SparkPipeline?  Just curious
> since it takes the org.apache.hadoop.mapreduce.TaskInputOutputContext in
> its read method.
>
> On Fri, May 6, 2016 at 4:56 PM Josh Wills <josh.wills@gmail.com> wrote:
>
>> Try using the ReadableData version of the PTable- it's an object that is
>> serializable and you can read the data from it into whatever you want in
>> the initialize method of the DoFn you pass it to.
>>
>> On Fri, May 6, 2016 at 1:03 PM David Ortiz <dpo5003@gmail.com> wrote:
>>
>>> Hello,
>>>
>>>       In attempt to make my code a little bit easier to following, I am
>>> attempting to materialize a PTable to a map and then pass it into another
>>> DoFn.  Unfortunately, since the value is an Avro record, I am getting a
>>> NotSerializableException out of the code when I try to use it.
>>>
>>>      I attempting to get around this by converting the record into a
>>> ByteBuffer with the avro utils, but lo and behold that's also not
>>> Serializable.  Since I do not see a convenient way to wrap a byte array
>>> with crunch, has anyone had any luck with any other approaches to getting a
>>> crunch-compatible serialized avro object?
>>>
>>> Thanks,
>>>      David Ortiz
>>>
>>

Mime
View raw message