cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Russell Alexander Spitzer (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-11542) Create a benchmark to compare HDFS and Cassandra bulk read times
Date Fri, 29 Apr 2016 20:42:13 GMT


Russell Alexander Spitzer commented on CASSANDRA-11542:

Hmm i'm a little confused that case classes don't help but Dataframes do... The code you presented
looks good to me, there is the potential issue of blocking on resultsets that take a long
time to complete while other result-sets are already on the driver but i'm not sure if this
is a big deal. Do you have any idea of the parallelization in these test? How many partitions
are the different runs generating?

> Create a benchmark to compare HDFS and Cassandra bulk read times
> ----------------------------------------------------------------
>                 Key: CASSANDRA-11542
>                 URL:
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Testing
>            Reporter: Stefania
>            Assignee: Stefania
>             Fix For: 3.x
>         Attachments:,
> I propose creating a benchmark for comparing Cassandra and HDFS bulk reading performance.
Simple Spark queries will be performed on data stored in HDFS or Cassandra, and the entire
duration will be measured. An example query would be the max or min of a column or a count\(*\).
> This benchmark should allow determining the impact of:
> * partition size
> * number of clustering columns
> * number of value columns (cells)

This message was sent by Atlassian JIRA

View raw message