incubator-drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomer Shiran <tshi...@maprtech.com>
Subject Re: Getting plugged in... (Cassandra and Drill?)
Date Mon, 21 Jan 2013 06:09:13 GMT
With "very selective" I intended to refer to the columns, not the rows.
That is, if your query only careas about 3 columns out of 100, then a true
columnar layout works great.


On Sun, Jan 20, 2013 at 10:07 PM, Tomer Shiran <tshiran@maprtech.com> wrote:

> Drill is being developed with the flexibility to support different data
> sources, so Cassandra support should not be a problem. Is that something
> you would be interested in building?
>
> The performance depends on the query. A query that involves a range scan
> would be very slow (assuming the default partitioner in Cassandra,
> RandomPartitioner), but point queries and queries that involve full table
> scans would provide reasonable performance. A full columnar layout would be
> faster for some queries (eg, queries that are very selective).
>
> BTW, Drill will support nested data, so JSON is not an issue.
>
>
> On Sun, Jan 20, 2013 at 8:37 PM, Brian O'Neill <bone@alumni.brown.edu>wrote:
>
>> Last week, Brad Anderson came up and presented at the PhillyDB meetup.
>> http://www.slideshare.net/boorad/phillydb-talk-beyond-batch
>>
>> He gave us an overview of Drill, and I'm curious...
>>
>> Presently, we heavily use Storm + Cassandra.
>>
>> http://brianoneill.blogspot.com/2012/08/a-big-data-trifecta-storm-kafka-and.html
>>
>> We treat CRUD operations as events. Then within Storm we calculate
>> aggregate counts of entities flowing through the system by various
>> dimensions.   That works well, but we still need an ad hoc reporting
>> capability, and a way to report on data in the system that is not
>> active (historical).
>>
>> Would it be possible to use the Drill engine against a Cassandra backend?
>> If so, what does that mean?   (implementing some API?)
>>
>> I assume that performance would be terrible unless somehow the data is
>> stored using the columnar data format from the Dremel paper.  Is that
>> accurate?  Does anyone know if anyone has attempted a translation of
>> that format to Cassandra?
>>
>> Regardless, I'm very interested in getting involved and no stranger to
>> getting my hands dirty.
>> Let me know if you can provide any direction. (our entities are
>> currently stored in JSON in Cassandra)
>>
>> -brian
>>
>>
>> --
>> Brian ONeill
>> Lead Architect, Health Market Science (http://healthmarketscience.com)
>> mobile:215.588.6024
>> blog: http://brianoneill.blogspot.com/
>> twitter: @boneill42
>>
>
>
>
> --
> Tomer Shiran
> Director of Product Management | MapR Technologies | 650-804-8657
>



-- 
Tomer Shiran
Director of Product Management | MapR Technologies | 650-804-8657

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message