spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yana Kadiyska <yana.kadiy...@gmail.com>
Subject Re: Apache gives exception when running groupby on df temp table
Date Fri, 17 Jul 2015 15:19:17 GMT
I think that might be a connector issue. You say you are using Spark 1.4,
are you also using 1.4 version of the Spark-cassandra-connector? The do
have some bugs around this, e.g.
https://datastax-oss.atlassian.net/browse/SPARKC-195. Also, I see that you
import org.apache.spark.sql.cassandra.CassandraSQLContext and I've seen
some odd things using that class. Things work out a lot better for me if I
create a dataframe like this:


val cassDF = sqlContext.read.format("org.apache.spark.sql.cassandra").options(Map(
"table" -> "some_table", "keyspace" -> "myks")).load

‚Äč

On Fri, Jul 17, 2015 at 10:52 AM, nipun <ibnipun10@gmail.com> wrote:

> spark version 1.4
>
> import com.datastax.spark.connector._
> import  org.apache.spark._
> import org.apache.spark.sql.cassandra.CassandraSQLContext
> import org.apache.spark.SparkConf
> //import com.microsoft.sqlserver.jdbc.SQLServerDriver
> import java.sql.Connection
> import java.sql.DriverManager
> import java.io.IOException
> import org.apache.spark.sql.DataFrame
>
>  def populateEvents() : Unit = {
>
>                 var query = "SELECT brandname, appname, packname,
> eventname,
> client, timezone  FROM sams.events WHERE eventtime > '" + _from + "' AND
> eventtime < '" + _to + "'"
>                 // read data from cassandra table
>                 val rdd = runCassandraQuery(query)
>
>                 rdd.registerTempTable("newdf")
>
>                 query = "Select brandname, appname, packname, eventname,
> client.OSName as platform, timezone from newdf"
>                 val dfCol = runCassandraQuery(query)
>
>                 val grprdd = dfCol.groupBy("brandname", "appname",
> "packname", "eventname", "platform", "timezone").count()
>
> Do let me know if you need any more information
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Apache-gives-exception-when-running-groupby-on-df-temp-table-tp13275p13285.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Mime
View raw message