arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Donald E. Foss" <donald.f...@gmail.com>
Subject Re: Weld
Date Sun, 20 Nov 2016 16:31:36 GMT
Thanks Julian. Sounds worth a listen. 

Donald E. Foss (mobile-US ET)

> On Nov 19, 2016, at 1:48 PM, Julian Hyde <jhyde@apache.org> wrote:
> 
> Matei Zaharia just spoke at the AMPlab seminar [1], and showed a couple of slides about
Weld. In the video of the day [2], his talk starts at 4:05:00, and he starts talking about
Weld at 4:28:30.
> 
> The essence is an intermediate language for row-level expressions, with the ability to
do limited iteration, with the goal of making it easier to pass data between UDFs written
in different languages. Sounds familiar? I would presume that an implementation of the language
would be strongly tied to a memory format. Or maybe it allows multiple possible implementations,
one of which would be Arrow in Java.
> 
> The slide listed Pandas as one of the supported front ends, so I wondered if Wes knew
something about the project.
> 
> I have been thinking of doing something similar in the Calcite / Drill / Arrow world.
In Calcite we have RexNodes as an expression language, and we have a Java code generator that
can target data represented as Java arrays, and another variant that can target data represented
as Java structs. Drill of course has a code generator that can target data in Arrow. I have
been thinking for a while of abstracting the code generators so that the person implementing,
say, the Filter+Project for “select x + y … where x > 5” doesn’t have to get their
hands dirty with code generation. There are a lot of optimizations to be done, e.g. remembering
that you’ve already made sure that x is not null.
> 
> Julian
> 
> [1] https://amplab.cs.berkeley.edu/endofproject/ <https://amplab.cs.berkeley.edu/endofproject/>
> 
> [2] https://youtu.be/KAacs9jYPHU <https://youtu.be/KAacs9jYPHU>
> 
> 
> 
>> On Nov 19, 2016, at 4:31 AM, Donald Foss <donald.foss@gmail.com> wrote:
>> 
>> Did you find that at https://cs.stanford.edu/~matei/? <https://cs.stanford.edu/~matei/?>
 That’s the only thing I can find via Google about it.  Do you have more detail or a link
to the paper itself?  I get the feeling that it is not yet fully complete despite 21 November
camera-ready CIDR 2017 deadline.
>> 
>> For those who aren’t familiar with CIDR, it is a conference that occurs every other
year.  This year’s agenda/program may be found at http://cidrdb.org/cidr2017/program.html
<http://cidrdb.org/cidr2017/program.html>.  CIDR is not an acronym for network subnet
masks—the first thing I thought of, Classless Inter Domain Routing, but Conference on Innovative
Data Systems Research, which focuses primarily on systems.  I hate to admit this, but I’m
unfamiliar with the conference, however that appears that it is because I’ve been out of
academia for far too long, and this conference seems to be the presentation of quite a few
interesting papers.  Just judging by title, a poor, yet humorous judge indeed, I like:
>> - “Dependency-Driven Analytics: A Compass for Uncharted Data Oceans” (Donald
- Why just data lakes when you can have data oceans?)
>> - “My Weak Consistency is Strong” (Donald - Great title, reminds me of Star Wars
and the “Force”)
>> - “SPOOF: Sum-Product Optimization and Operator Fusion for Large-Scale Machine
Learning” (Donald - Another brilliant backronym.)
>> 
>> The Weld paper is the last paper to be presented on 10 January 2017 between 2:30
and 4:05 (UTC-8).
>> 
>> On a side note, looking down that page a little, I love the title of the last paper
in 2016, Yggdrasil: An Optimized System for Training Deep Decision Trees at Scale <https://cs.stanford.edu/~matei/papers/2016/nips_yggdrasil.pdf>.
 When I see Yggdrasil, the first thing I think of is a really big tree and Norse mythology.
 It’s a great name.  I’m going to read some of his other papers this weekend.
>> 
>> Donald Foss
>> donald.foss@gmail.com
>> ------ __o
>> ----_`\<,_
>> ---(_)/ (_)
>> 
>> The information in this email is confidential and may be legally privileged. It is
intended solely for the addressee. Access to this e-mail by anyone else is unauthorized.
>> 
>>> On Nov 18, 2016, at 4:42 PM, Julian Hyde <jhyde@apache.org> wrote:
>>> 
>>> Anyone know anything about Matei Zaharia’s Weld project?
>>> 
>>>    • S. Palkar, J. Thomas, A. Shanbhag, H. Pirk, M. Schwarzkopf, S. Amarasinghe
and M. Zaharia. Weld: A Common Runtime for High Performance Data Analytics, to appear at CIDR
2017.
>>> 
>>> It seems to have similar goals to Arrow.
>>> 
>>> Julian
>>> 
>> 
> 

Mime
View raw message