cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gaspar Muñoz <gmu...@stratio.com>
Subject Re: map reduce for Cassandra
Date Tue, 22 Jul 2014 06:53:15 GMT
Check Stratio Deep <https://github.com/Stratio/stratio-deep> This
integration between spark and Cassandra is not based on the Cassandra's
Hadoop interface.


2014-07-22 3:53 GMT+02:00 Marcelo Elias Del Valle <marcelo@s1mbi0se.com.br>:

> Hi,
>
>
>> But if you are only relying on memtables to sort writes, that seems like
>> a pretty heavyweight reason to use Cassandra?
>
>
> Actually, it's not a reason to use Cassandra. I already use Cassandra and
> I need to map reduce data from it. I am trying to see a reason to use the
> conventional M/R tools or to build a tool "specific" to Cassandra.
>
> but Cassandra, as a datastore with immutable data files, is not typically
>> a good choice for short lived intermediate result sets...
>
>
> Indeed, but so far I am seeing it as the best option. I storing this
> intermediate files in HDFS is better, then I agree there is no reason to
> consider Cassandra to do it.
>
> are you planning to use DSE?
>
>
> Our company will probably hire DSE support when it reaches some size, but
> DSE as a product doesn't seem interesting to our case so far. The only tool
> that would help be at this moment would be HIVE, but honestly I didn't like
> the way DSE supports hive and I don't want to use a solution not available
> to DSC (see
> http://stackoverflow.com/questions/23959169/problems-using-hive-cassandra-community
> for details).
>
> []s
>
>
>
> 2014-07-21 22:09 GMT-03:00 Robert Coli <rcoli@eventbrite.com>:
>
> On Mon, Jul 21, 2014 at 5:45 PM, Marcelo Elias Del Valle <
>> marcelo@s1mbi0se.com.br> wrote:
>>
>>> Although several sstables (disk fragments) may have the same row key,
>>> inside a single sstable row keys and column keys are indexed, right?
>>> Otherwise, doing a GET in Cassandra would take some time.
>>> From the M/R perspective, I was reffering to the mem table, as I am
>>> trying to compare the time to insert in Cassandra against the time of
>>> sorting in hadoop.
>>>
>>
>>  I was confused, because unless you are using new "in-memory"
>> columnfamilies, which I believe are only available in DSE, there is no way
>> to ensure that any given row stays in a memtable. Very rarely is there a
>> view of the function of a memtable that only cares about its properties and
>> not the closely related properties of SSTables. However yours is one of
>> them, I see now why your question makes sense, you only care about the
>> memtable for how quickly it sorts.
>>
>> But if you are only relying on memtables to sort writes, that seems like
>> a pretty heavyweight reason to use Cassandra?
>>
>> I'm certainly not an expert in this area of Cassandra... but Cassandra,
>> as a datastore with immutable data files, is not typically a good choice
>> for short lived intermediate result sets... are you planning to use DSE?
>>
>> =Rob
>>
>>
>

Mime
View raw message