apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Weise <...@apache.org>
Subject Re: Adding Avro Module to encapsulate AvroFileInput Operator and AvroToPojo Operator
Date Sat, 26 Aug 2017 05:20:14 GMT
There is no right subject line for this thread because multiple issues were
conflated into a single PR. Furthermore, this discussion should take place
before the work is done, not afterwards. The reviewer of the PR should know
the contribution process and guide the contributor to ensure no effort is
wasted.

The existing Avro operators have basic shortcomings, such as type
support/conversion. I don't know in which applications you use it, perhaps
the code was written under assumptions that happen to work for a specific
scenario, but not in general. It is also OK to have such limitation for
code that lives in contrib, but it needs to be worked on before promoting
it.

The work it takes to make it ready for promotion is more than a "quick
fix". It is totally possible if the contributor is willing to take it up,
but best approached by assessing the scope beforehand and dividing the work
in smaller units that can also be properly reviewed. I could imagine the
following for a useful Avro module:

   - Review operators and make sure they really work in a generic way
   - Add a page of documentation
   - Add example for the new file reader that shows the things that users
   are likely to need: parallel partitioning, how to configure schema

I don't think that additional clutter should be created in the codebase for
"backward compatibility". The operators in contrib are not production ready
and we should not create extra baggage that is only adding to the
confusion. The library as a whole needs the opposite in order to improve:
removal of useless code.

Thomas



On Fri, Aug 25, 2017 at 2:21 PM, Pramod Immaneni <pramod@datatorrent.com>
wrote:

> Why not create a new maven project for avro, isn't the idea to separate out
> operators by functionality into their own maven modules on an ongoing basis
> instead of the uber contrib. Do you have specific concerns about the
> viability of the avro operator, if so could you enumerate them. We have
> used it in applications in our company for reading avro files. I agree we
> should probably have a different subject line for this discussion to
> indicate the maven change as well. Could the thread originator do the
> needful?
>
> I don't think we should restrict it, if someone is providing backward
> compatibility for the current operators till the other discussion about 4.x
> is resolved. We try to maintain compatibility for other @Evolving operators
> such as AbstractFileInput. We can take a different position on @Evolving
> operators on 4.x line.
>
> Thanks
>
> On Thu, Aug 24, 2017 at 12:06 PM, Thomas Weise <thw@apache.org> wrote:
>
> > I see no reason to create a separate Maven project along with this
> change.
> > The new classes can be added to the contrib module (under
> org.apache.apex),
> > as @Evolving.
> >
> > The Maven project should be discussed separately and I would look for an
> > explanation why it makes sense as such, overall improvements to the
> > functionality as well as documentation and tests before considering such
> > refactor.
> >
> > IMO we should also not construct proxy classes and other dependencies
> that
> > pretend backward compatibility when in fact they don't and also don't
> need
> > to, since the code in contrib is evolving.
> >
> > Thanks,
> > Thomas
> >
> >
> > On Mon, Aug 21, 2017 at 11:18 AM, Saumya Mohan <saumya@datatorrent.com>
> > wrote:
> >
> > > Hi all,
> > >
> > > *JIRA*: https://issues.apache.org/jira/browse/APEXMALHAR-2034
> > > *Pull Request*: https://github.com/apache/apex-malhar/pull/666
> > >
> > > *Functional Change*
> > > As per the above JIRA, we have created a new Avro Module to address the
> > > following issue
> > >
> > > *Issue:*
> > > Avro objects are not serialized by Kryo causing the Avro GenericRecord
> to
> > > not be available to downstream operators if users don't explicitly mark
> > the
> > > stream locality at container_local or thread_local.
> > >
> > > *Solution:*
> > > We have created a Module on top of AvroFileInputOperator and AvroToPojo
> > > operators such that downstream operators will access POJO instead of
> Avro
> > > GenericRecord. It, therefore, removes the exposure of GenericRecord to
> > > downstream operators and instead exposes the created POJO to downstream
> > > operators.
> > >
> > > In this Module, the stream between the two encapsulated operators
> > > (AvroFileInputOperator and AvroToPojo) is set to CONTAINER_LOCAL.
> > >
> > >
> > > *Avro package move out of contrib*
> > >
> > > Along with creating new avro module, existing avro files are moved from
> > > contrib module to a new 'avro' package under malhar repo.
> > >
> > >
> > >    - This move is in accordance to JIRA
> > >    https://issues.apache.org/jira/browse/APEXMALHAR-1843
> > >    - To ensure backward compatibility, old operator files are marked
> > >    deprecated and made to extend from new operator files.
> > >    - Git history of all the moved files is maintained
> > >
> > > Unit/application-level tests done are elaborated in the JIRA.
> > >
> > > Please let me know if any concerns around this dev work.
> > >
> > >
> > > Best Regards,
> > >
> > > Saumya Mohan
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message