drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neeraja Rentachintala <nrentachint...@maprtech.com>
Subject Re: Drill with Spark
Date Sat, 17 May 2014 19:12:28 GMT
In addition what others said, below are few others (answered in an email
thread some time back).


-----------
- Drill provides ANSI SQL. This means that all the BI/Analytics and SQL
tools can work as is with Drill using JDBC/ODBC. Druid provides REST APIs
as the query layer.I am not sure if Druid has SQL layer at all (don't see
it in their docs)

- Query flexibility is high with Drill. For ex: Druid supports groupBy
style queries, but doesn't support JOINs. Drill supports all the key
analytic functionality such as JOINs, aggregations, sort, filters, wide
variety of functions to operate on data which makes it suitable for a more
broader set of use cases

- Drill supports queries natively on Hadoop data formats (JSON, parquet,
Text as well as all Hive file formats). You don't need to load or copy the
data into a specific format in order to do queries.

- Drill can do direct queries on self-describing data such as JSON,
Parquet, HBase without defining schema overlays in Hive. You can take a
look at the "Apache Drill in 10 mins doc" below to get started with Drill
around some of these capabilities.
https://cwiki.apache.org/confluence/display/DRILL/Apache+Drill+in+10+Minutes



On Sat, May 17, 2014 at 7:43 AM, Timothy Chen <tnachen@gmail.com> wrote:

> Druid just like redshift requires an extra ETL to import the data before
> you can query, which slows down the freshness of your query able data.
>
> Obvious three are pros and cons to each decision, but Drill also tries to
> do optimizations as much as possible with metadata available, and also down
> the road will able to again enough stats after a scan or perhaps even a
> extra compute stats like what impala does.
>
> Tim
>
> Sent from my iPhone
>
> > On May 17, 2014, at 12:27 AM, Amit Matety <matety@yahoo.com> wrote:
> >
> > In the regards to comparison: How does it compare to Druid which is also
> an in-memory warehouse ? Does Drill support joins to in memory dimension
> tables unlike Druid? Does it have any limitation on the number of records
> it can fetch, etc?
> >
> > Regards,
> > Amit
> >
> >> On May 16, 2014, at 8:46 PM, Jason Altekruse <altekrusejason@gmail.com>
> wrote:
> >>
> >> Ted covered the most important points. I just want to add a few
> >> clarifications.
> >>
> >> While the code for Drill so far is written in pure Java, there is not
> >> specific requirement that all of Drill run in Java. Part of the
> motivation
> >> for using the in-memory representation of records that we did, making it
> >> columnar, and also storing it in java native ByteBuffers, was to enable
> >> integration with native code compiled from C/C++ to run some of our
> >> operators. ByteBuffers are part of the official Java API, but their use
> is
> >> not recommend. They allow memory operations that you do not find in
> typical
> >> java data types and structures, but require you to manage your own
> memory.
> >>
> >> One important use case for us is the ability to pass them through the
> Java
> >> Native Interface without having to do a copy. While it is still
> inefficient
> >> to jump from Java to C every record, we should be able to define a clean
> >> interface to take a batch of records (around 1000) in a single jump to
> a C
> >> context and after the C code finishes processing them, a single jump
> back
> >> into the java context will also be able to complete quickly in the same
> >> manner as the jump in the other direction.
> >>
> >> With this consideration, any language you could pass data to from C
> would
> >> be compatible. While we likely will not support a wide array of plugin
> >> languages soon, it should be possible for people to plug in a variety of
> >> existing codebases for adding data processing functionalities to Drill.
> >>
> >> -Jason Altekruse
> >>
> >>
> >>> On Fri, May 16, 2014 at 8:11 PM, Ted Dunning <ted.dunning@gmail.com>
> wrote:
> >>>
> >>> Drill is a very different tool from spark or even from Spark SQL (aka
> >>> Shark).
> >>>
> >>> There is some overlap, but there are important differences.  For
> instance,
> >>>
> >>> - Drill supports weakly typed SQL.
> >>>
> >>> - Drill has a very clever way to pass data from one processor to
> another.
> >>> This allows very efficient processing
> >>>
> >>> - Drill generates code in response to query and to observed data.
>  This is
> >>> a big deal since it allows high speed with dynamic types
> >>>
> >>> - Drill supports full ANSII SQL, not Hive QL.
> >>>
> >>> - Spark supports programming in Scala
> >>>
> >>> - Spark ties distributed data object to objects in a language like
> Java or
> >>> Scala rather than using a columnar form.  This makes generic user
> written
> >>> code easier, but is less efficient.
> >>>
> >>>
> >>>
> >>>
> >>> On Thu, May 15, 2014 at 9:41 AM, N.Venkata Naga Ravi
> >>> <nvn_ravi@hotmail.com>wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I started exploring Drill , it looks like very interesting tool. Can
> some
> >>>> body explain how Drill is going to compare with Apache Spark and
> Storm.
> >>>> Do we still need Apache Spark along with Drill in the Bigdata stack?
> Or
> >>>> Drill can directly support as replacement with Spark?
> >>>>
> >>>> Thanks,
> >>>> Ravi
> >>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message