cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From George Stergiou <gst...@gmail.com>
Subject Re: Cassandra, vnodes, and spark
Date Tue, 16 Sep 2014 12:32:50 GMT
Run into this performance report

https://github.com/datastax/spark-cassandra-connector/issues/200

Does spark connector in its current state issue one CQL per vnode or task
per vnode?

Regards.

On Tue, Sep 16, 2014 at 2:05 AM, DuyHai Doan <doanduyhai@gmail.com> wrote:

> Look into the source code of the Spark connector. CassandraRDD try to find
> all token ranges (even when using vnodes) for each node (endpoint) and
> create RDD partition to match this distribution of token ranges. Thus data
> locality is guaranteed
>
> On Tue, Sep 16, 2014 at 4:39 AM, Eric Plowe <eric.plowe@gmail.com> wrote:
>
>> Interesting. The way I understand the spark connector is that it's
>> basically a client executing a cql query and filling a spark rdd. Spark
>> will then handle the partitioning of data. Again, this is my understanding,
>> and it maybe incorrect.
>>
>>
>> On Monday, September 15, 2014, Robert Coli <rcoli@eventbrite.com> wrote:
>>
>>> On Mon, Sep 15, 2014 at 4:57 PM, Eric Plowe <eric.plowe@gmail.com>
>>> wrote:
>>>
>>>> Based on this stackoverflow question, vnodes effect the number of
>>>> mappers Hadoop needs to spawn. Which in then affect performance.
>>>>
>>>> With the spark connector for cassandra would the same situation happen?
>>>> Would vnodes affect performance in a similar situation to Hadoop?
>>>>
>>>
>>> I don't know what specifically Spark does here, but if it has the same
>>> locality expectations as Hadoop generally, my belief would be : "yes."
>>>
>>> =Rob
>>>
>>>
>

Mime
View raw message