drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jacques Nadeau (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4455) Depend on Apache Arrow for Vector and Memory
Date Tue, 22 Nov 2016 17:05:59 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687300#comment-15687300

Jacques Nadeau commented on DRILL-4455:

As someone who worked a lot on the memory layer and accounting stuff, I'm not sure how one
would split it without introducing a level of indirection that would impact performance. The
problem has to do with the ability to transfer data accounting that exists within the memory
buffers and trying to do that while maintaining a single canonical memory representation and
supporting limits. For reference, please review the information at [1] to understand how the
pieces work together.

We have two challenges I see at this point. 

- This was originally proposed in November of 15. Note the attached slides in [2], specifically
the last one where all three approaches included the vectors and memory management moving
together in the project (due to the nature of the coupling). Not hearing any disagreement
and then going through the massive amount of work that this patch took to build and then hitting
a -1 6 months later takes a lot of wind out of one's sails. 

- The larger problem is I'm not sure who is going to have the interest to try to do this patch
again. We're now ~6 months later with two trees that have moved in their own directions. Rebase
is probably very difficult (or impossible). My sense is that Arrow will continue to create
value and at some point, the Drill community will achieve a consensus that it is valuable
to do this work. In the meantime, I'm not sure anyone's heart is in it right now. 

So while it may make sense to ultimately try to come up with a better approach to modularity
in the Arrow library around the first point, I'd like to see some demand from the community
that wants to use Arrow to do that (possibly in the form of patches or approaches proposed).

PS: An interesting question would be: how much development has happened in the "disputed module"
in Drill since this patch (or since my major reworking of it ~12 months ago). 

[1] https://github.com/apache/arrow/tree/master/java/memory/src/main/java/org/apache/arrow/memory
[2] http://markmail.org/thread/74ns3peuwbaolcod

> Depend on Apache Arrow for Vector and Memory
> --------------------------------------------
>                 Key: DRILL-4455
>                 URL: https://issues.apache.org/jira/browse/DRILL-4455
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Steven Phillips
>            Assignee: Steven Phillips
>             Fix For: 2.0.0
> The code for value vectors and memory has been split and contributed to the apache arrow
repository. In order to help this project advance, Drill should depend on the arrow project
instead of internal value vector code.
> This change will require recompiling any external code, such as UDFs and StoragePlugins.
The changes will mainly just involve renaming the classes to the org.apache.arrow namespace.

This message was sent by Atlassian JIRA

View raw message