spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luciano Resende <luckbr1...@gmail.com>
Subject Re: SparkR - calling as.vector() with rdd dataframe causes error
Date Fri, 18 Sep 2015 16:33:59 GMT
I see the thread with all the responses on the bottom at mail-archive :

https://www.mail-archive.com/user%40spark.apache.org/msg36882.html

On Fri, Sep 18, 2015 at 7:58 AM, Ellen Kraffmiller <
ellen.kraffmiller@gmail.com> wrote:

> Thanks for your response.  Is there a reason why this thread isn't
> appearing on the mailing list?  So far, I only see my post, with no
> answers, although I have received 2 answers via email.  It would be nice if
> other people could see these answers as well.
>
> On Thu, Sep 17, 2015 at 2:22 AM, Sun, Rui <rui.sun@intel.com> wrote:
>
>> The existing algorithms operating on R data.frame can't simply operate on
>> SparkR DataFrame. They have to be re-implemented to be based on SparkR
>> DataFrame API.
>>
>> -----Original Message-----
>> From: ekraffmiller [mailto:ellen.kraffmiller@gmail.com]
>> Sent: Thursday, September 17, 2015 3:30 AM
>> To: user@spark.apache.org
>> Subject: SparkR - calling as.vector() with rdd dataframe causes error
>>
>> Hi,
>> I have a library of clustering algorithms that I'm trying to run in the
>> SparkR interactive shell. (I am working on a proof of concept for a
>> document classification tool.) Each algorithm takes a term document matrix
>> in the form of a dataframe.  When I pass the method a local dataframe, the
>> clustering algorithm works correctly, but when I pass it a spark rdd, it
>> gives an error trying to coerce the data into a vector.  Here is the code,
>> that I'm calling within SparkR:
>>
>> # get matrix from a file
>> file <-
>>
>> "/Applications/spark-1.5.0-bin-hadoop2.6/examples/src/main/resources/matrix.csv"
>>
>> #read it into variable
>>  raw_data <- read.csv(file,sep=',',header=FALSE)
>>
>> #convert to a local dataframe
>> localDF = data.frame(raw_data)
>>
>> # create the rdd
>> rdd  <- createDataFrame(sqlContext,localDF)
>>
>> #call the algorithm with the localDF - this works result <-
>> galileo(localDF, model='hclust',dist='euclidean',link='ward',K=5)
>>
>> #call with the rdd - this produces error result <- galileo(rdd,
>> model='hclust',dist='euclidean',link='ward',K=5)
>>
>> Error in as.vector(data) :
>>   no method for coercing this S4 class to a vector
>>
>>
>> I get the same error if I try to directly call as.vector(rdd) as well.
>>
>> Is there a reason why this works for localDF and not rdd?  Should I be
>> doing something else to coerce the object into a vector?
>>
>> Thanks,
>> Ellen
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-calling-as-vector-with-rdd-dataframe-causes-error-tp24717.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org For additional
>> commands, e-mail: user-help@spark.apache.org
>>
>>
>


-- 
Luciano Resende
http://people.apache.org/~lresende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Mime
View raw message