drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Altekruse <altekruseja...@gmail.com>
Subject Re: Drill with Spark
Date Sat, 17 May 2014 03:46:42 GMT
Ted covered the most important points. I just want to add a few

While the code for Drill so far is written in pure Java, there is not
specific requirement that all of Drill run in Java. Part of the motivation
for using the in-memory representation of records that we did, making it
columnar, and also storing it in java native ByteBuffers, was to enable
integration with native code compiled from C/C++ to run some of our
operators. ByteBuffers are part of the official Java API, but their use is
not recommend. They allow memory operations that you do not find in typical
java data types and structures, but require you to manage your own memory.

One important use case for us is the ability to pass them through the Java
Native Interface without having to do a copy. While it is still inefficient
to jump from Java to C every record, we should be able to define a clean
interface to take a batch of records (around 1000) in a single jump to a C
context and after the C code finishes processing them, a single jump back
into the java context will also be able to complete quickly in the same
manner as the jump in the other direction.

With this consideration, any language you could pass data to from C would
be compatible. While we likely will not support a wide array of plugin
languages soon, it should be possible for people to plug in a variety of
existing codebases for adding data processing functionalities to Drill.

-Jason Altekruse

On Fri, May 16, 2014 at 8:11 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> Drill is a very different tool from spark or even from Spark SQL (aka
> Shark).
> There is some overlap, but there are important differences.  For instance,
> - Drill supports weakly typed SQL.
> - Drill has a very clever way to pass data from one processor to another.
>  This allows very efficient processing
> - Drill generates code in response to query and to observed data.  This is
> a big deal since it allows high speed with dynamic types
> - Drill supports full ANSII SQL, not Hive QL.
> - Spark supports programming in Scala
> - Spark ties distributed data object to objects in a language like Java or
> Scala rather than using a columnar form.  This makes generic user written
> code easier, but is less efficient.
> On Thu, May 15, 2014 at 9:41 AM, N.Venkata Naga Ravi
> <nvn_ravi@hotmail.com>wrote:
> > Hi,
> >
> > I started exploring Drill , it looks like very interesting tool. Can some
> > body explain how Drill is going to compare with Apache Spark and Storm.
> > Do we still need Apache Spark along with Drill in the Bigdata stack? Or
> > Drill can directly support as replacement with Spark?
> >
> > Thanks,
> > Ravi
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message