incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John D. Ament" <johndam...@apache.org>
Subject Re: [DISCUSS] Daffodil Incubation Proposal
Date Thu, 10 Aug 2017 02:21:22 GMT
Sorry, only responded to one part :/

You can start the vote as well.  Feel free to follow the format used at
https://lists.apache.org/thread.html/2da6f1920aa7d9f0ee9edbd2a4e6a8e0e5db9aac40e503fd87a4cdb0@%3Cgeneral.incubator.apache.org%3E

If you have any questions, respond here or privately.

John

On Wed, Aug 9, 2017 at 10:19 PM John D. Ament <johndament@apache.org> wrote:

> Steve,
>
> You could list either of us.
>
> John
>
>
> On Wed, Aug 9, 2017 at 11:55 AM Steve Lawrence <
> stephen.d.lawrence@gmail.com> wrote:
>
>> Sounds good to me. Can I start a vote, or is something a champion/mentor
>> would normally start? The project also does not have a champion--is that
>> necessary/would either of you be interested in being the champion?
>>
>> Thanks,
>> - Steve
>>
>> On 08/08/2017 10:59 PM, Dave Fisher wrote:
>> > Hi -
>> >
>> > I agree. I'm willing to proceed with John and I as Mentors.
>> >
>> > Regards,
>> > Dave
>> >
>> > Sent from my iPhone
>> >
>> >> On Aug 8, 2017, at 7:10 PM, John D. Ament <johndament@apache.org>
>> wrote:
>> >>
>> >> Steve,
>> >>
>> >> At this point, I'd recommend we wrap the discussion and call for a
>> vote.  While ideally we want 3 mentors, we can get started with 2 and see
>> how things progress.
>> >>
>> >> John
>> >>
>> >>> On Wed, Aug 2, 2017 at 3:55 PM Steve Lawrence <
>> stephen.d.lawrence@gmail.com> wrote:
>> >>> Thanks John!
>> >>>
>> >>> On 08/02/2017 03:23 PM, John D. Ament wrote:
>> >>>> You can also count me in as a mentor.
>> >>>>
>> >>>> John
>> >>>>
>> >>>> On Wed, Aug 2, 2017 at 3:14 PM Steve Lawrence <
>> stephen.d.lawrence@gmail.com>
>> >>>> wrote:
>> >>>>
>> >>>>> Understood. Thanks for the interest!
>> >>>>>
>> >>>>> - Steve
>> >>>>>
>> >>>>> On 08/02/2017 02:57 PM, Dave Fisher wrote:
>> >>>>>> Hi Steve,
>> >>>>>>
>> >>>>>> It was not so much the lack of committers as it was the current
>> >>>>> diversity. That is not a blocker for entry to Incubation.
>> >>>>>>
>> >>>>>> I am willing to be one of the Mentors. Once there are at least two
>> more
>> >>>>> we can push forward.
>> >>>>>>
>> >>>>>> Regards,
>> >>>>>> Dave
>> >>>>>>
>> >>>>>>> On Aug 1, 2017, at 5:09 AM, Steve Lawrence <
>> >>>>> stephen.d.lawrence@gmail.com> wrote:
>> >>>>>>>
>> >>>>>>> Discussions have died down, and I think the consensus from the
>> responses
>> >>>>>>> is that the issues are 1) the lack of committers and 2) the lack
>> of a
>> >>>>>>> champion and mentors. We hope to address #1 and grow the
>> community as
>> >>>>>>> part of incubation. Is anyone interested in being a champion or
>> mentor
>> >>>>>>> and help us with #2?
>> >>>>>>>
>> >>>>>>> Thanks,
>> >>>>>>> - Steve
>> >>>>>>>
>> >>>>>>> On 07/26/2017 04:06 PM, Chris Mattmann wrote:
>> >>>>>>>> This sounds like a very interesting project.
>> >>>>>>>>
>> >>>>>>>> I don’t have the time to mentor at the moment but I will keep a
>> close
>> >>>>> eye on it.
>> >>>>>>>>
>> >>>>>>>> Cheers,
>> >>>>>>>> Chris Mattmann
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> On 7/25/17, 11:53 AM, "McHenry, Kenton Guadron" <
>> mchenry@illinois.edu>
>> >>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>    Hi Dave,
>> >>>>>>>>
>> >>>>>>>>    The developers that were at NCSA have moved on to other
>> >>>>> organizations.  While we still leverage Daffodil and are very much
>> >>>>> interested in seeing it move forward, development is currently done
>> by the
>> >>>>> Tresys team.  Agreed on the synergy with Tika.
>> >>>>>>>>
>> >>>>>>>>    Kenton McHenry, Ph.D.
>> >>>>>>>>    Principal Research Scientist, Adjunct Assistant Professor of
>> >>>>> Computer Science
>> >>>>>>>>    Deputy Director of the Scientific Software & Applications
>> Division
>> >>>>>>>>    National Center for Supercomputing Applications, University of
>> >>>>> Illinois at Urbana-Champaign
>> >>>>>>>>
>> >>>>>>>>    On Jul 24, 2017, at 1:55 PM, Dave Fisher <
>> dave2wave@comcast.net
>> >>>>> <mailto:dave2wave@comcast.net>> wrote:
>> >>>>>>>>
>> >>>>>>>>    Hi Kenton,
>> >>>>>>>>
>> >>>>>>>>    Is there any reason that you and others from the NCSA are not
>> >>>>> Initial Committers? That would make this proposal stronger.
>> >>>>>>>>
>> >>>>>>>>    Regarding Apache Tika - it relies on other projects including
>> >>>>> Apache POI and Apache PDFBox. They are pragmatic about what is
>> used. If
>> >>>>> Daffodil works to expand then I think that there would be good
>> synergy
>> >>>>> between the projects. I know as a POI PMC member that the POI
>> community has
>> >>>>> significantly benefited from the Tika community some of whom are
>> from Mitre.
>> >>>>>>>>
>> >>>>>>>>    To date Tika has not emphasized structured data, although
>> they do
>> >>>>> extract content from Excel and OpenOffice.
>> >>>>>>>>
>> >>>>>>>>    I am intrigued.
>> >>>>>>>>
>> >>>>>>>>    Regards,
>> >>>>>>>>    Dave
>> >>>>>>>>
>> >>>>>>>>    On Jul 24, 2017, at 10:55 AM, McHenry, Kenton Guadron <
>> >>>>> mchenry@illinois.edu<mailto:mchenry@illinois.edu>> wrote:
>> >>>>>>>>
>> >>>>>>>>    Yes, DFDL and its open source implementation Daffodil are more
>> >>>>> about file formats and getting access to the entirety of a file's
>> contents
>> >>>>> in a consistent way through machine readable specifications.  The
>> work has
>> >>>>> implications in the area of digital preservation allowing one to
>> preserve
>> >>>>> these machine readable specifications rather than all the tools
>> needed to
>> >>>>> open/save a file in order to work with it.  Imagine someone
>> developing
>> >>>>> graphics software to work with 3D models and not having to worry
>> about the
>> >>>>> hundreds of formats out there for 3D meshes (whether there are
>> tools for
>> >>>>> opening the files and whether they can get access to those tools,
>> whether
>> >>>>> the spec is available and worrying about how complex that spec is to
>> >>>>> implement, etc.), and simply building their code around the
>> contents (e.g.
>> >>>>> vertices, faces, etc.).  One could come up with similar scenarios
>> for other
>> >>>>> data types (documents, images, videos, audio, depth data, numeric
>> data).
>> >>>>> Ideally tools built supporting DFDL, could someday, support any
>> format for
>> >>>>> that type without the developer having to worry about the details
>> of how
>> >>>>> that data is represented within a file.
>> >>>>>>>>
>> >>>>>>>>    Kenton McHenry, Ph.D.
>> >>>>>>>>    Principal Research Scientist, Adjunct Assistant Professor of
>> >>>>> Computer Science
>> >>>>>>>>    Deputy Director of the Scientific Software & Applications
>> Division
>> >>>>>>>>    National Center for Supercomputing Applications, University of
>> >>>>> Illinois at Urbana-Champaign
>> >>>>>>>>
>> >>>>>>>>    On Jul 24, 2017, at 10:30 AM, Steve Lawrence <
>> >>>>> stephen.d.lawrence@gmail.com<mailto:stephen.d.lawrence@gmail.com
>> ><mailto:
>> >>>>> stephen.d.lawrence@gmail.com>> wrote:
>> >>>>>>>>
>> >>>>>>>>    I'll preface this saying that I don't have a ton of
>> experience with
>> >>>>>>>>    Apache Tika. But based on my understanding, Tika and Daffodil
>> do
>> >>>>> have
>> >>>>>>>>    somewhat similar goals, but reach them in different ways. For
>> >>>>> example,
>> >>>>>>>>    Tika requires that one writes /code/ to perform data
>> extraction,
>> >>>>> usually
>> >>>>>>>>    relying on existing Java libraries to extract the desired
>> metadata.
>> >>>>> The
>> >>>>>>>>    downside to this is that code can be buggy, and libraries
>> might not
>> >>>>> even
>> >>>>>>>>    exist for formats of interest (especially common with legacy
>> and
>> >>>>>>>>    military data).
>> >>>>>>>>
>> >>>>>>>>    Daffodil, on the other hand, does not require one to write
>> any code.
>> >>>>>>>>    Instead, one writes a DFDL Schema (similar to XML Schema,
>> with DFDL
>> >>>>>>>>    annotations) that fully describes the data, which Daffodil
>> then
>> >>>>> uses to
>> >>>>>>>>    convert the data to XML/JSON for extraction. So adding
>> support for
>> >>>>> a new
>> >>>>>>>>    format means writing a new schema rather than new code. And
>> less
>> >>>>> code
>> >>>>>>>>    generally means less bugs. Also, for secure systems that
>> require
>> >>>>>>>>    certification, generally speaking, it is easier to certify a
>> schema
>> >>>>> as
>> >>>>>>>>    compared to code.
>> >>>>>>>>
>> >>>>>>>>    We certainly don't believe that Daffodil could replace Tika,
>> but it
>> >>>>> does
>> >>>>>>>>    have the potential to add new functionality to Tika for
>> formats
>> >>>>> that do
>> >>>>>>>>    not have existing libraries. One of our goals is to look into
>> >>>>>>>>    integrating Daffodil support into tools like Tika. We'd love
>> to hear
>> >>>>>>>>    from Tika devs if this is something they'd be interested in.
>> >>>>>>>>
>> >>>>>>>>    I'll also add that whereas Tika tends to focus primarily on
>> >>>>> metadata,
>> >>>>>>>>    DFDL schemas usually describe an entire file format down to
>> the
>> >>>>> byte, so
>> >>>>>>>>    one can extract more than just meta data, including text and
>> binary
>> >>>>>>>>    data. Further differentiating, Daffodil has support for
>> serializing
>> >>>>> data
>> >>>>>>>>    (called unparse) from the XML/JSON representation, allowing
>> one to
>> >>>>>>>>    transform or filter data as well. We don't believe this
>> feature is
>> >>>>> all
>> >>>>>>>>    that applicable to Tika, but may be useful to other
>> technologies
>> >>>>> such as
>> >>>>>>>>    filtering or data fuzzing technologies.
>> >>>>>>>>
>> >>>>>>>>    - Steve
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>    On 07/24/2017 10:59 AM, Mike Drob wrote:
>> >>>>>>>>    What is the relationship between Daffodil and something like
>> Apache
>> >>>>> Tika's
>> >>>>>>>>    extraction engine?
>> >>>>>>>>
>> >>>>>>>>    On Mon, Jul 24, 2017 at 9:53 AM, Steve Lawrence <
>> >>>>>>>>    stephen.d.lawrence@gmail.com<mailto:
>> stephen.d.lawrence@gmail.com
>> >>>>>> <mailto:stephen.d.lawrence@gmail.com>> wrote:
>> >>>>>>>>
>> >>>>>>>>    Dear Apache Incubator Community,
>> >>>>>>>>
>> >>>>>>>>    We would like to start a discussion around a proposal to bring
>> >>>>> Daffodil
>> >>>>>>>>    into the Apache Incubator. Daffodil is a implementation of
>> the DFDL
>> >>>>>>>>    specification used to convert between fixed format data and
>> >>>>> XML/JSON.
>> >>>>>>>>
>> >>>>>>>>    The draft proposal can be found in the wiki at the following
>> URL:
>> >>>>>>>>
>> >>>>>>>>    https://wiki.apache.org/incubator/DaffodilProposal
>> >>>>>>>>
>> >>>>>>>>    We do not yet have a champion or mentors, but it was
>> recommended
>> >>>>> that we
>> >>>>>>>>    create a proposal and send it to this list to potentially
>> find those
>> >>>>>>>>    that might be interested. The text for the draft proposal is
>> found
>> >>>>>>>>    below. We look forward to your input.
>> >>>>>>>>
>> >>>>>>>>    Thanks,
>> >>>>>>>>    -Steve
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>    = Daffodil Proposal =
>> >>>>>>>>
>> >>>>>>>>    == Abstract ==
>> >>>>>>>>
>> >>>>>>>>    Daffodil is an implementation of the Data Format Description
>> >>>>> Language
>> >>>>>>>>    (DFDL) used to convert between fixed format data and XML/JSON.
>> >>>>>>>>
>> >>>>>>>>    == Proposal ==
>> >>>>>>>>
>> >>>>>>>>    The Data Format Description Language (DFDL) is a
>> specification,
>> >>>>>>>>    developed by the Open Grid Forum, capable of describing many
>> data
>> >>>>>>>>    formats, including both textual and binary, scientific and
>> numeric,
>> >>>>>>>>    legacy and modern, commercial record-oriented, and many
>> industry and
>> >>>>>>>>    military standards. It defines a language that is a subset of
>> W3C
>> >>>>> XML
>> >>>>>>>>    schema to describe the logical format of the data, and
>> annotations
>> >>>>>>>>    within the schema to describe the physical representation.
>> >>>>>>>>
>> >>>>>>>>    Daffodil is an open source implementation of the DFDL
>> specification
>> >>>>> that
>> >>>>>>>>    uses these DFDL schemas to parse fixed format data into an
>> infoset,
>> >>>>>>>>    which is most commonly represented as either XML or JSON. This
>> >>>>> allows
>> >>>>>>>>    the use of well-established XML or JSON technologies and
>> libraries
>> >>>>> to
>> >>>>>>>>    consume, inspect, and manipulate fixed format data in existing
>> >>>>>>>>    solutions. Daffodil is also capable of the reverse by
>> serializing or
>> >>>>>>>>    "unparsing" an XML or JSON infoset back to the original data
>> format.
>> >>>>>>>>
>> >>>>>>>>    == Background ==
>> >>>>>>>>
>> >>>>>>>>    Many different software solutions need to consume and manage
>> data,
>> >>>>>>>>    including data directed routing, databases, data analysis,
>> data
>> >>>>>>>>    cleansing, data visualizing, and more. A key aspect of such
>> >>>>> solutions is
>> >>>>>>>>    the need to transform the data into an easily consumable
>> format.
>> >>>>>>>>    Usually, this means that for each unique data format, one
>> develops a
>> >>>>>>>>    tool that can read and extract the necessary information,
>> often
>> >>>>> leading
>> >>>>>>>>    to ad-hoc and data-format-specific description systems. Such
>> >>>>> systems are
>> >>>>>>>>    often proprietary, not well tested, and incompatible, leading
>> to
>> >>>>> vendor
>> >>>>>>>>    lock-in, flawed software, and increased training costs. DFDL
>> is a
>> >>>>> new
>> >>>>>>>>    standard, with version 1.0 completed in October of 2016, that
>> solves
>> >>>>>>>>    these problems by defining an open standard to describe many
>> >>>>> different
>> >>>>>>>>    data formats and how to parse and unparse between the data and
>> >>>>> XML/JSON.
>> >>>>>>>>
>> >>>>>>>>    Two closed source implementations of DFDL currently exist. The
>> >>>>> first was
>> >>>>>>>>    created by IBM and is now part of their IBM® Integration Bus
>> >>>>> product.
>> >>>>>>>>    The second was created by the European Space Agency, called
>> DFDL4S
>> >>>>> or
>> >>>>>>>>    "DFDL for Space" targeted at the challenges of their
>> satellite data
>> >>>>>>>>    processing.
>> >>>>>>>>
>> >>>>>>>>    Around 2005, Pacific Northwest National Lab created Defuddle,
>> built
>> >>>>> as
>> >>>>>>>>    an open source implementation and proof of concept of the
>> draft DFDL
>> >>>>>>>>    specification and a test bed to feed new concepts into
>> specification
>> >>>>>>>>    development. Primary development of Defuddle was eventually
>> taken
>> >>>>> over
>> >>>>>>>>    by the National Center for Supercomputing Applications (NCSA).
>> >>>>> However,
>> >>>>>>>>    due to evolution of the DFDL specification and architectural
>> and
>> >>>>>>>>    performance issues with Defuddle, around 2009, NCSA restarted
>> the
>> >>>>>>>>    project with the new name of Daffodil, with a goal of
>> implementing
>> >>>>> the
>> >>>>>>>>    complete DFDL specification. Daffodil development continued
>> at NCSA
>> >>>>>>>>    until around 2012, at which point development slowed due to
>> budget
>> >>>>>>>>    limitations. Shortly thereafter, primary development was
>> picked up
>> >>>>> by
>> >>>>>>>>    Tresys Technology where it continues today, with
>> contributions from
>> >>>>>>>>    other entities such as the Navy Research Lab, the Air Force
>> Research
>> >>>>>>>>    Lab, MITRE, and Booz Allen Hamilton. In February of 2015,
>> Daffodil
>> >>>>>>>>    version 1.0.0 was released, including support for the DFDL
>> features
>> >>>>>>>>    needed to parse many common file formats. Daffodil version
>> 2.0.0 is
>> >>>>>>>>    expected to be released in August of 2017, which will include
>> >>>>> unparse
>> >>>>>>>>    support with one-to-one parsing feature parity.
>> >>>>>>>>
>> >>>>>>>>    Entities including IBM, MITRE, NATO NCI Agency,
>> Northrop-Grumman,
>> >>>>> Quark
>> >>>>>>>>    Security, Raytheon, and Tresys Technology have developed DFDL
>> >>>>> schemas
>> >>>>>>>>    for many data formats from varying technology domains,
>> including
>> >>>>> PNG,
>> >>>>>>>>    GIF, BMP, PCAP, HL7, EDIFACT, NACHA, vCard, iCalendar, and
>> >>>>> MIL-STD-2045,
>> >>>>>>>>    many of which are publicly available on the DFDL Schemas
>> github.
>> >>>>> There
>> >>>>>>>>    are also a number of military-application data formats, the
>> >>>>>>>>    specifications of which are not public, which have
>> historically been
>> >>>>>>>>    very difficult and expensive to process, and for which DFDL
>> schemas
>> >>>>> have
>> >>>>>>>>    been created or are actively in development; these include
>> >>>>>>>>    MIL-STD-6040/USMTF ATO, MIL-STD-6017/VMF, MIL-STD-6016/NATO
>> STANAG
>> >>>>> 5516
>> >>>>>>>>    (aka "Link16").
>> >>>>>>>>
>> >>>>>>>>    == Rationale ==
>> >>>>>>>>
>> >>>>>>>>    Numerous software solutions exist that consume, inspect,
>> analyze,
>> >>>>> and
>> >>>>>>>>    transform data, many of which can be found in the Apache
>> Software
>> >>>>>>>>    Foundation (ASF). In order for tools like these to consume new
>> >>>>> types of
>> >>>>>>>>    data, custom extensions are usually required, often with high
>> >>>>>>>>    development and testing costs. Daffodil fills a clear gap in
>> many of
>> >>>>>>>>    these solutions, providing a simple and low cost way to
>> transform
>> >>>>> data
>> >>>>>>>>    to XML or JSON, which many of these tools natively support
>> already.
>> >>>>> With
>> >>>>>>>>    the upcoming 2.0.0 release, the Daffodil project will have
>> achieved
>> >>>>> a
>> >>>>>>>>    level of functionality in both parse and unparse that, when
>> >>>>> integrated
>> >>>>>>>>    into existing solutions, could provide for a new method to
>> quickly
>> >>>>>>>>    enable support for new data formats.
>> >>>>>>>>
>> >>>>>>>>    == Initial Goals ==
>> >>>>>>>>
>> >>>>>>>>    * Relicense the existing code from the University of
>> Illinois/NCSA
>> >>>>> Open
>> >>>>>>>>    Source License to the Apache License version 2.0, working with
>> >>>>> Apache
>> >>>>>>>>    Legal to ensure correctness, and with Daffodil contributors
>> to get
>> >>>>>>>>    their permission.
>> >>>>>>>>    * Move the existing codebase, documentation, bugs, and mailing
>> >>>>> lists to
>> >>>>>>>>    the Apache hosted infrastructure
>> >>>>>>>>    * Establish a formal release process and schedule, allowing
>> for
>> >>>>>>>>    dependable release cycles in a manner consistent with the
>> Apache
>> >>>>>>>>    development process.
>> >>>>>>>>    * Build relationships with ASF projects to add Daffodil
>> support
>> >>>>> where
>> >>>>>>>>    appropriate
>> >>>>>>>>    * Grow the community to establish a diversity of background
>> and
>> >>>>> expertise.
>> >>>>>>>>
>> >>>>>>>>    == Current Status ==
>> >>>>>>>>
>> >>>>>>>>    === Meritocracy ===
>> >>>>>>>>
>> >>>>>>>>    All initial committers are familiar with the principles of
>> >>>>> meritocracy.
>> >>>>>>>>    The Daffodil project has followed the model of meritocracy in
>> the
>> >>>>> past,
>> >>>>>>>>    providing multiple outside entities commit access based on the
>> >>>>> quality
>> >>>>>>>>    of their contributions. In order to grow the Daffodil user
>> base and
>> >>>>>>>>    development community, we are dedicated to continuing to
>> operate
>> >>>>>>>>    Daffodil as a meritocracy.
>> >>>>>>>>
>> >>>>>>>>    A key ingredient in a meritocracy of developers is open group
>> code
>> >>>>>>>>    review. The Daffodil project has operated in this mode
>> throughout
>> >>>>> its
>> >>>>>>>>    existence and this provides a forum to improve the code,
>> verify code
>> >>>>>>>>    quality, and educate new developers on the code base.
>> >>>>>>>>
>> >>>>>>>>    === Community ===
>> >>>>>>>>
>> >>>>>>>>    Daffodil has a small community of users and developers.
>> Although
>> >>>>> primary
>> >>>>>>>>    Daffodil development is done by Tresys Technology, a handful
>> of
>> >>>>> other
>> >>>>>>>>    contributions have come from other entities including the Navy
>> >>>>> Research
>> >>>>>>>>    Lab, the Air Force Research Lab, MITRE, and Booz Allen
>> Hamilton. In
>> >>>>>>>>    addition to developers, multiple users of Daffodil have
>> created DFDL
>> >>>>>>>>    schemas, including entities such as MITRE, IBM, Raytheon,
>> Quark
>> >>>>>>>>    Security, and Tresys Technology. The DFDL Schemas github
>> community
>> >>>>> has
>> >>>>>>>>    been created as a place for DFDL schemas to be published. The
>> >>>>> Daffodil
>> >>>>>>>>    project also makes use of mailing lists, !HipChat, and
>> Confluence
>> >>>>>>>>    Questions to build a community of users and system for
>> support.
>> >>>>>>>>
>> >>>>>>>>    === Core Developers ===
>> >>>>>>>>
>> >>>>>>>>    The core developers of Daffodil are employed by Tresys
>> Technology.
>> >>>>> We
>> >>>>>>>>    will work to grow the community among a more diverse set of
>> >>>>> developers
>> >>>>>>>>    and industries.
>> >>>>>>>>
>> >>>>>>>>    === Alignment ===
>> >>>>>>>>
>> >>>>>>>>    Daffodil was created as an open source project with a
>> philosophy
>> >>>>>>>>    consistent with The Apache Way. A strong belief in
>> meritocracy,
>> >>>>>>>>    community involvement in decisions, openness, and ensuring a
>> high
>> >>>>> level
>> >>>>>>>>    of quality in code, documentation, and testing are some of our
>> >>>>> shared
>> >>>>>>>>    core beliefs.
>> >>>>>>>>
>> >>>>>>>>    Further, as mentioned in the Rationale section, Daffodil
>> fills a gap
>> >>>>>>>>    that exists in many ASF projects, including !NiFi, Spark,
>> Storm,
>> >>>>> Hadoop,
>> >>>>>>>>    Tika, and others. In order for tools like these to consume new
>> >>>>> types of
>> >>>>>>>>    data, custom extensions are usually required. Rather than
>> create
>> >>>>> such
>> >>>>>>>>    extensions, Daffodil provides an easy and standards-compliant
>> way to
>> >>>>>>>>    transform data to XML or JSON, which many of these tools
>> already
>> >>>>>>>>    natively support.
>> >>>>>>>>
>> >>>>>>>>    == Known Risks ==
>> >>>>>>>>
>> >>>>>>>>    === Orphaned Products ===
>> >>>>>>>>
>> >>>>>>>>    The current core developers are the leading contributors in
>> the
>> >>>>> space of
>> >>>>>>>>    DFDL and wish to see it flourish. Though there is some risk
>> that the
>> >>>>>>>>    initial committers all come from the same company, a goal of
>> >>>>> entering
>> >>>>>>>>    into incubation is to grow the development community to
>> minimize the
>> >>>>>>>>    risk of reliance on a single company.
>> >>>>>>>>
>> >>>>>>>>    === Inexperience with Open Source ===
>> >>>>>>>>
>> >>>>>>>>    The Daffodil project began as an open source project and has
>> >>>>> continued
>> >>>>>>>>    that model throughout development. This includes public bug
>> >>>>> tracking,
>> >>>>>>>>    git revision control, automated builds and tests, and a
>> public wiki
>> >>>>> for
>> >>>>>>>>    documentation.
>> >>>>>>>>
>> >>>>>>>>    Additionally, the current core developers and initial
>> committers all
>> >>>>>>>>    work for a company that relies on, believes in, promotes, and
>> has
>> >>>>> led or
>> >>>>>>>>    contributed to many open source software projects, including
>> SELinux
>> >>>>>>>>    Userspace, OpenSCAP, CLIP, refpolicy, setools, RPM, and
>> others. As
>> >>>>> such,
>> >>>>>>>>    there is low risk related to inexperience with open source
>> software
>> >>>>> and
>> >>>>>>>>    processes.
>> >>>>>>>>
>> >>>>>>>>    === Homogeneous Developers ===
>> >>>>>>>>
>> >>>>>>>>    The proposed initial committers come from a single entity,
>> though
>> >>>>> we are
>> >>>>>>>>    committed to growing the Daffodil development community to
>> include a
>> >>>>>>>>    broad group of additional committers from a wide array of
>> >>>>> industries.
>> >>>>>>>>
>> >>>>>>>>    === Reliance on Salaried Developers ===
>> >>>>>>>>
>> >>>>>>>>    The proposed initial committers are paid by their employer to
>> >>>>> contribute
>> >>>>>>>>    to the Daffodil project. We expect that Daffodil development
>> will
>> >>>>>>>>    continue with salaried developers, and are committed to
>> growing the
>> >>>>>>>>    community to include non-salaried developers as well.
>> >>>>>>>>
>> >>>>>>>>    === Relationship with other Apache Projects ===
>> >>>>>>>>
>> >>>>>>>>    As mentioned in the Alignment section, Daffodil fills a clear
>> gap in
>> >>>>>>>>    numerous other ASF projects that consume and manage large
>> amounts
>> >>>>> of data.
>> >>>>>>>>
>> >>>>>>>>    As a specific example, Daffodil developers have created a
>> Daffodil
>> >>>>>>>>    Apache !NiFi Processor, currently in use in data transfer
>> solutions,
>> >>>>>>>>    which allows one to ingest non-native data into an Apache
>> !NiFi
>> >>>>> pipeline
>> >>>>>>>>    as XML or JSON. This processor was well received by the
>> Apache !NiFi
>> >>>>>>>>    developers, with positive comments about the concise API and
>> how it
>> >>>>>>>>    could handle non-native data. Daffodil developers have also
>> >>>>> successfully
>> >>>>>>>>    prototyped integration with Apache Spark. We believe Daffodil
>> could
>> >>>>>>>>    provide a strong benefit to many other ASF projects that
>> handle
>> >>>>> fixed
>> >>>>>>>>    format data. We anticipate working closely with such ASF
>> projects to
>> >>>>>>>>    include Daffodil where applicable to increase their ability to
>> >>>>> support
>> >>>>>>>>    new data formats with minimal effort.
>> >>>>>>>>
>> >>>>>>>>    Daffodil also depends on existing ASF projects, including
>> Apache
>> >>>>> Commons
>> >>>>>>>>    and Apache Xerces.
>> >>>>>>>>
>> >>>>>>>>    === An Excessive Fascination with the Apache Brand ===
>> >>>>>>>>
>> >>>>>>>>    Although the Apache brand may certainly help to attract more
>> >>>>>>>>    contributors, publicity is not the reason for this proposal.
>> We
>> >>>>> believe
>> >>>>>>>>    Daffodil could provide a great benefit to the ASF and the
>> numerous
>> >>>>> data
>> >>>>>>>>    focused projects that comprise it, as described in the
>> Rationale and
>> >>>>>>>>    Alignment sections. We hope to build a strong and vibrant
>> community
>> >>>>>>>>    built around The Apache Way, and not dependent on a single
>> company.
>> >>>>>>>>
>> >>>>>>>>    === Documentation ===
>> >>>>>>>>
>> >>>>>>>>    Daffodil documentation can be found at:
>> >>>>>>>>
>> >>>>>>>>    *
>> >>>>>>>>    https://opensource.ncsa.illinois.edu/confluence/
>> >>>>>>>>    display/DFDL/Daffodil%3A+Open+Source+DFDL
>> >>>>>>>>
>> >>>>>>>>    Information about DFDL can be found at:
>> >>>>>>>>
>> >>>>>>>>    * https://www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
>> >>>>>>>>    *
>> >>>>>>>>    https://www.ibm.com/support/knowledgecenter/en/SSMKHH_9.0.
>> >>>>>>>>    0/com.ibm.etools.mft.doc/df20060_.htm
>> >>>>>>>>
>> >>>>>>>>    Public examples of DFDL Schemas can be found at:
>> >>>>>>>>
>> >>>>>>>>    * https://github.com/DFDLSchemas
>> >>>>>>>>
>> >>>>>>>>    == Initial Source ==
>> >>>>>>>>
>> >>>>>>>>    The Daffodil git repo goes back to mid-2011 with
>> approximately 20
>> >>>>>>>>    different contributors and feedback from many users and
>> developers.
>> >>>>> The
>> >>>>>>>>    core codebase is written in Scala and includes both a Scala
>> and Java
>> >>>>>>>>    API, along with Javadocs and Scaladocs for API usage. The
>> initial
>> >>>>> code
>> >>>>>>>>    will come from the git repository currently hosted by NCSA at
>> the
>> >>>>>>>>    University of Illinois :
>> >>>>>>>>
>> >>>>>>>>    https://opensource.ncsa.illinois.edu/bitbucket/
>> >>>>>>>>    projects/DFDL/repos/daffodil/
>> >>>>>>>>
>> >>>>>>>>    == Source and Intellectual Property Submission ==
>> >>>>>>>>
>> >>>>>>>>    The complete Daffodil code is licensed under the University of
>> >>>>>>>>    Illinois/NCSA Open Source License. Much of the current
>> codebase has
>> >>>>> been
>> >>>>>>>>    developed by Tresys Technology, who is open to relicensing
>> the code
>> >>>>> to
>> >>>>>>>>    the Apache License version 2.0 and donate the source to the
>> ASF.
>> >>>>>>>>    Contacts at NCSA are also open to relicensing their
>> contributions to
>> >>>>>>>>    Apache v2. We plan to contact the other contributors and ask
>> for
>> >>>>>>>>    permission to relicense and donate their contributed code.
>> For those
>> >>>>>>>>    that decline or we cannot contact, their code will be removed
>> or
>> >>>>>>>>    replaced. We will work closely with Apache Legal to ensure all
>> >>>>> issues
>> >>>>>>>>    related to relicensing are acceptable.
>> >>>>>>>>
>> >>>>>>>>    == External Dependencies ==
>> >>>>>>>>
>> >>>>>>>>    We believe all current dependencies are compatible with the
>> ASF
>> >>>>>>>>    guidelines. Our dependency licenses come from the following
>> license
>> >>>>>>>>    styles: Apache v2, BSD, MIT, and ICU. The list of current
>> Daffodil
>> >>>>>>>>    dependencies and their licenses are documented here:
>> >>>>>>>>
>> >>>>>>>>    https://opensource.ncsa.illinois.edu/confluence/
>> >>>>>>>>    display/DFDL/Dependencies+and+Licenses
>> >>>>>>>>
>> >>>>>>>>    == Cryptography ==
>> >>>>>>>>
>> >>>>>>>>    None
>> >>>>>>>>
>> >>>>>>>>    == Required Resources ==
>> >>>>>>>>
>> >>>>>>>>    === Mailing Lists ===
>> >>>>>>>>
>> >>>>>>>>    * commits@daffodil.incubator.apache.org
>> >>>>>>>>    * dev@daffodil.incubator.apache.org
>> >>>>>>>>    * private@daffodil.incubator.apache.org
>> >>>>>>>>    * user@daffodil.incubator.apache.org
>> >>>>>>>>
>> >>>>>>>>    === Source Control ===
>> >>>>>>>>
>> >>>>>>>>    git://git.apache.org/incubator-daffodil.git
>> >>>>>>>>
>> >>>>>>>>    === Issue Tracking ===
>> >>>>>>>>
>> >>>>>>>>    JIRA Daffodil (DFDL)
>> >>>>>>>>
>> >>>>>>>>    === Initial Committers ===
>> >>>>>>>>
>> >>>>>>>>    * Beth Finnegan <efinnegan at tresys dot com>
>> >>>>>>>>    * Dave Thompson <dthompson at tresys dot com>
>> >>>>>>>>    * Josh Adams <jadams at tresys dot com>
>> >>>>>>>>    * Mike Beckerle <mbeckerle at tresys dot com>
>> >>>>>>>>    * Steve Lawrence <slawrence at tresys dot com>
>> >>>>>>>>    * Taylor Wise <twise at tresys dot com>
>> >>>>>>>>
>> >>>>>>>>    === Affiliations ===
>> >>>>>>>>
>> >>>>>>>>    * Beth Finnegan (Tresys Technology)
>> >>>>>>>>    * Dave Thompson (Tresys Technology)
>> >>>>>>>>    * Josh Adams (Tresys Technology)
>> >>>>>>>>    * Mike Beckerle (Tresys Technology)
>> >>>>>>>>    * Steve Lawrence (Tresys Technology)
>> >>>>>>>>    * Taylor Wise (Tresys Technology)
>> >>>>>>>>
>> >>>>>>>>    == Sponsors ==
>> >>>>>>>>
>> >>>>>>>>    === Champion ===
>> >>>>>>>>
>> >>>>>>>>    * TBD
>> >>>>>>>>
>> >>>>>>>>    === Nominated Mentors ===
>> >>>>>>>>
>> >>>>>>>>    * TBD
>> >>>>>>>>
>> >>>>>>>>    === Sponsoring Entity ===
>> >>>>>>>>
>> >>>>>>>>    We request the Apache Incubator to sponsor this project.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>
>> ---------------------------------------------------------------------
>> >>>>>>>>    To unsubscribe, e-mail:
>> general-unsubscribe@incubator.apache.org
>> >>>>>>>>    For additional commands, e-mail:
>> general-help@incubator.apache.org
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>
>> ---------------------------------------------------------------------
>> >>>>>>>>    To unsubscribe, e-mail:
>> general-unsubscribe@incubator.apache.org
>> >>>>> <mailto:general-unsubscribe@incubator.apache.org>
>> >>>>>>>>    For additional commands, e-mail:
>> general-help@incubator.apache.org
>> >>>>> <mailto:general-help@incubator.apache.org>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> ---------------------------------------------------------------------
>> >>>>>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> >>>>>>>> For additional commands, e-mail:
>> general-help@incubator.apache.org
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> ---------------------------------------------------------------------
>> >>>>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> >>>>>>> For additional commands, e-mail:
>> general-help@incubator.apache.org
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> ---------------------------------------------------------------------
>> >>>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> >>>>> For additional commands, e-mail: general-help@incubator.apache.org
>> >>>>>
>> >>>>
>> >>>
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> >>> For additional commands, e-mail: general-help@incubator.apache.org
>> >>>
>> >
>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message