spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-8629) R code in SparkR
Date Thu, 25 Jun 2015 13:15:04 GMT

     [ https://issues.apache.org/jira/browse/SPARK-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Arun updated SPARK-8629:
------------------------
    Description: 
Data set:  
  
DC_City  	Dc_Code	ItemNo  	Itemdescription	                dat   Month	Year	SalesQuantity

Hyderabad	11	100005010	more. Value Chana Dal 1 Kg.	9/16/2012	9-Sep 2012	 1 
Hyderabad	11	100005010	more. Value Chana Dal 1 Kg.	12/21/2012	12-Dec2012 1 
Hyderabad	11	100005010	more. Value Chana Dal 1 Kg.	1/12/2013	1-Jan	2013	 1 
Hyderabad	11	100005010	more. Value Chana Dal 1 Kg.	1/27/2013	1-Jan	2013	 3 
Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/1/2013	2-Feb	2013	 2 
Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/12/2013	2-Feb	2013	 3 
Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/13/2013	2-Feb	2013	 2 
Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/14/2013	2-Feb	2013	 1 
Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/15/2013	2-Feb	2013	 8 
Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/16/2013	2-Feb	2013	 18 
Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/17/2013	2-Feb	2013	 19 
Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/18/2013	2-Feb	2013	 18 
Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/19/2013	2-Feb	2013	 18 
Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/20/2013	2-Feb	2013	 16 
Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/21/2013	2-Feb	2013	 25 
Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/22/2013	2-Feb	2013	 19 
Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/23/2013	2-Feb	2013	 17 
Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/24/2013	2-Feb	2013	 39 
Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/25/2013	2-Feb	2013	 23 


Code i used in R:

 > data <- read.csv("D:/R/Data_sale_quantity.csv" ,stringsAsFactors=FALSE) 
 >factors <- unique(data$ItemNo) 
 > df.allitems <- data.frame() 
 > for(i in 1:length(factors)) 
 
 { 
   >data1 <- filter(data, ItemNo  == factors[[i]]) 
 >data2<select(data1,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) 
 >date2$date <- as.Date(date2$date, format = "%m/%d/%y")  
 >data3 <- data2[order(data2$date), ]  
 df.allitems <- rbind(data3 , df.allitems)  # Append by row bind 
  } 

  
 > write.csv(df.allitems,"E:/all_items.csv") 

------------------------------------------------------------------------------- 
  
I have done some SparkR code: 
  data1 <- read.csv("D:/Data_sale_quantity_mini.csv") # read in R 
  df_1 <- createDataFrame(sqlContext, data2) # converts Rdata.frame to spark DF 
  factors <- distinct(df_1) # removed duplicates 
  
#for select i used: 
  df_2 <- select(distinctDF ,"DC_City","Itemdescription","ItemNo","date","Year","SalesQuantity")
# select action 

I dont know how to: 
  1) create a empty sparkR DF 
  2) Using for loop in SparkR 
  3) change the date format. 
  4) find the lenght() in spark df 
  5) using rbind in sparkR 
  
can you help me out in doing the above code in sparkR.


  was:
Data set:  
  
DC_City  	Dc_Code	ItemNo  	Itemdescription	                dat   Month	Year	SalesQuantity

Hyderabad	11	100005010	more. Value Chana Dal 1 Kg.	9/16/2012	9-Sep 2012	 1 
Hyderabad	11	100005010	more. Value Chana Dal 1 Kg.	12/21/2012	12-Dec2012 1 
Hyderabad	11	100005010	more. Value Chana Dal 1 Kg.	1/12/2013	1-Jan	2013	 1 
Hyderabad	11	100005010	more. Value Chana Dal 1 Kg.	1/27/2013	1-Jan	2013	 3 
Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/1/2013	2-Feb	2013	 2 
Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/12/2013	2-Feb	2013	 3 
Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/13/2013	2-Feb	2013	 2 
Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/14/2013	2-Feb	2013	 1 
Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/15/2013	2-Feb	2013	 8 
Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/16/2013	2-Feb	2013	 18 
Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/17/2013	2-Feb	2013	 19 
Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/18/2013	2-Feb	2013	 18 
Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/19/2013	2-Feb	2013	 18 
Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/20/2013	2-Feb	2013	 16 
Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/21/2013	2-Feb	2013	 25 
Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/22/2013	2-Feb	2013	 19 
Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/23/2013	2-Feb	2013	 17 
Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/24/2013	2-Feb	2013	 39 
Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/25/2013	2-Feb	2013	 23 


Code i used in R:

 > data <- read.csv("D:/R/Data_sale_quantity.csv" ,stringsAsFactors=FALSE) 
 >factors <- unique(data$ItemNo) 
 > df.allitems <- data.frame() 
 > for(i in 1:length(factors)) 
  { 
   >data1 <- filter(data, ItemNo  == factors[[i]]) 
 >data2<select(data1,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) 
 >date2$date <- as.Date(date2$date, format = "%m/%d/%y")  
 >data3 <- data2[order(data2$date), ]  
 df.allitems <- rbind(data3 , df.allitems)  # Append by row bind 
  } 
  
 > write.csv(df.allitems,"E:/all_items.csv") 

------------------------------------------------------------------------------- 
  
I have done some SparkR code: 
  data1 <- read.csv("D:/Data_sale_quantity_mini.csv") # read in R 
  df_1 <- createDataFrame(sqlContext, data2) # converts Rdata.frame to spark DF 
  factors <- distinct(df_1) # removed duplicates 
  
#for select i used: 
  df_2 <- select(distinctDF ,"DC_City","Itemdescription","ItemNo","date","Year","SalesQuantity")
# select action 

I dont know how to: 
  1) create a empty sparkR DF 
  2) Using for loop in SparkR 
  3) change the date format. 
  4) find the lenght() in spark df 
  5) using rbind in sparkR 
  
can you help me out in doing the above code in sparkR.



> R code in SparkR
> ----------------
>
>                 Key: SPARK-8629
>                 URL: https://issues.apache.org/jira/browse/SPARK-8629
>             Project: Spark
>          Issue Type: Question
>          Components: R
>            Reporter: Arun
>            Priority: Minor
>
> Data set:  
>   
> DC_City  	Dc_Code	ItemNo  	Itemdescription	                dat   Month	Year	SalesQuantity

> Hyderabad	11	100005010	more. Value Chana Dal 1 Kg.	9/16/2012	9-Sep 2012	 1 
> Hyderabad	11	100005010	more. Value Chana Dal 1 Kg.	12/21/2012	12-Dec2012 1 
> Hyderabad	11	100005010	more. Value Chana Dal 1 Kg.	1/12/2013	1-Jan	2013	 1 
> Hyderabad	11	100005010	more. Value Chana Dal 1 Kg.	1/27/2013	1-Jan	2013	 3 
> Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/1/2013	2-Feb	2013	 2 
> Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/12/2013	2-Feb	2013	 3 
> Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/13/2013	2-Feb	2013	 2 
> Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/14/2013	2-Feb	2013	 1 
> Hyderabad	11	100005011	more. Value Chana Dal 1 Kg.	2/15/2013	2-Feb	2013	 8 
> Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/16/2013	2-Feb	2013	 18 
> Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/17/2013	2-Feb	2013	 19 
> Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/18/2013	2-Feb	2013	 18 
> Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/19/2013	2-Feb	2013	 18 
> Hyderabad	11	100005012	more. Value Chana Dal 1 Kg.	2/20/2013	2-Feb	2013	 16 
> Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/21/2013	2-Feb	2013	 25 
> Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/22/2013	2-Feb	2013	 19 
> Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/23/2013	2-Feb	2013	 17 
> Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/24/2013	2-Feb	2013	 39 
> Hyderabad	11	100005013	more. Value Chana Dal 1 Kg.	2/25/2013	2-Feb	2013	 23 
> Code i used in R:
>  > data <- read.csv("D:/R/Data_sale_quantity.csv" ,stringsAsFactors=FALSE) 
>  >factors <- unique(data$ItemNo) 
>  > df.allitems <- data.frame() 
>  > for(i in 1:length(factors)) 
>  
>  { 
>    >data1 <- filter(data, ItemNo  == factors[[i]]) 
>  >data2<select(data1,DC_City,Itemdescription,ItemNo,date,Year,SalesQuantity) 
>  >date2$date <- as.Date(date2$date, format = "%m/%d/%y")  
>  >data3 <- data2[order(data2$date), ]  
>  df.allitems <- rbind(data3 , df.allitems)  # Append by row bind 
>   } 
>   
>  > write.csv(df.allitems,"E:/all_items.csv") 
> ------------------------------------------------------------------------------- 
>   
> I have done some SparkR code: 
>   data1 <- read.csv("D:/Data_sale_quantity_mini.csv") # read in R 
>   df_1 <- createDataFrame(sqlContext, data2) # converts Rdata.frame to spark DF 
>   factors <- distinct(df_1) # removed duplicates 
>   
> #for select i used: 
>   df_2 <- select(distinctDF ,"DC_City","Itemdescription","ItemNo","date","Year","SalesQuantity")
# select action 
> I dont know how to: 
>   1) create a empty sparkR DF 
>   2) Using for loop in SparkR 
>   3) change the date format. 
>   4) find the lenght() in spark df 
>   5) using rbind in sparkR 
>   
> can you help me out in doing the above code in sparkR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message