crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Blue (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CRUNCH-293) Injection of reader into AvroRecordReader
Date Wed, 20 Nov 2013 18:32:35 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13825634#comment-13825634
] 

Ryan Blue edited comment on CRUNCH-293 at 11/20/13 6:31 PM:
------------------------------------------------------------

Micah, Josh just pointed me at this issue, which is well-timed. I just ran into the same problem
recently and implemented a solution I was going to submit a patch for. My problem was that
I wanted to override the avro generic classes rather than the specific. My implementation
is very similar to yours, only I changed the reflect factory to a ReaderWriterFactory and
added an AvroMode enum to handle each case (REFLECT, SPECIFIC, GENERIC). The enum approach
brings all of the code together like your AvroDataFactory, but keeps the handling for each
mode separate. Each mode can be individually overridden:
{code:java}
  AvroMode.SPECIFIC.override(specificFactory);
{code}

It also cleans up some of the places that hard-code specific or reflect readers to use the
right one based on the AvroType:
{code:java}
  AvroMode.fromType(atype).configure(bundle)
{code}

This is what makes it possible for me to override the generics correctly. A lot of places
simply used Reflect because it is the most general... but that causes problems if you need
to change specific (e.g., use a different ClassLoader) or generic.

Let me know what you think of the [branch|https://github.com/rdblue/crunch/commit/110614da91dc2b7609520883bb3cdfd40ea68700]
and whether it might work for your needs. Please ignore all of the unnecessary import changes,
I still need to clean it up.


was (Author: rdblue):
Micah, Josh just pointed me at this issue, which is well-timed. I just ran into the same problem
recently and implemented a solution I was going to submit a patch for. My problem was that
I wanted to override the avro generic classes rather than the specific. My implementation
is very similar to yours, only I changed the reflect factory to a ReaderWriterFactory and
added an AvroMode enum to handle each case (REFLECT, SPECIFIC, GENERIC). The enum approach
brings all of the code together like your AvroDataFactory, but keeps the handling for each
mode separate. Each mode can be individually overridden:
{code:java}
  AvroMode.SPECIFIC.override(specificFactory);
{code}

It also cleans up some of the places that hard-code specific or reflect readers to use the
right one based on the AvroType:
{code:java}
  AvroMode.forType(atype).configure(bundle)
{code}

This is what makes it possible for me to override the generics correctly. A lot of places
simply used Reflect because it is the most general... but that causes problems if you need
to change specific (e.g., use a different ClassLoader) or generic.

Let me know what you think of the [branch|https://github.com/rdblue/crunch/commit/110614da91dc2b7609520883bb3cdfd40ea68700]
and whether it might work for your needs. Please ignore all of the unnecessary import changes,
I still need to clean it up.

> Injection of reader into AvroRecordReader
> -----------------------------------------
>
>                 Key: CRUNCH-293
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-293
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.0, 0.8.0
>            Reporter: Micah Whitacre
>            Assignee: Micah Whitacre
>         Attachments: CRUNCH-293.patch, CRUNCH-293_v2.patch
>
>
> With CRUNCH-243, I wanted to support injecting custom readers to handle the cases like
passivity between Avro Schema.  The changes made however were not complete as we also need
to be able to inject a reader into the AvroRecordReader which constructs its own SpecificDatumReader.
> We could create a SpecificDataFactory which emulates the ReflectDataFactory.  Or simplify
to a single DataFactory which will create either Reflect/Specific/Generic.  Thoughts?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message