spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jasleen Kaur <jasleenkaur1...@gmail.com>
Subject Re: Spark Partition by Columns doesn't work properly
Date Thu, 09 Jun 2016 05:43:42 GMT
Try using the datastax package. There was a great talk on spark summit
about it. It will take care of the boiler plate code and you can focus on
real business value

On Wednesday, June 8, 2016, Chanh Le <giaosudau@gmail.com> wrote:

> Hi everyone,
> I tested the partition by columns of data frame but it’s not good I mean
> wrong.
> I am using Spark 1.6.1 load data from Cassandra.
> I repartition by 2 field date, network_id - 200 partitions
> I reparation by 1 field date - 200 partitions.
> but my data is data of 90 days -> I mean if we reparation by date it will
> be 90 partitions.
>
> val daily = sql
>   .read
>   .format("org.apache.spark.sql.cassandra")
>   .options(Map("table" -> dailyDetailTableName, "keyspace" -> reportSpace))
>   .load()
>   .repartition(col("date"))
>
>
>
> I mean It doesn’t change the way I put the columns to repartition.
>
> Does anyone has the same problem?
>
> Thank in advance.
>

Mime
View raw message