ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jennifer Coston <Jennifer.Cos...@raytheon.com>
Subject Ignite and Spark Streaming Integration Using Java
Date Wed, 16 Dec 2015 14:27:32 GMT

I am trying to understand the integration of Apache Spark Streaming and
Apache Ignite but I’m running into some errors. I’m not sure where I went
wrong, so I am going to walk through all of my steps. I have worked through
two word counting tutorials provided by Apache. In the first tutorial I
created a Java project that uses Ignite to count the words in a document on
a sliding window of 5 seconds and then query the cache to determine the
most common words. I believe it created a structure that looks like this:

In the second tutorial, I created a Java project that counts the words in a
document on a rolling window of 5 seconds and saves the output to a text
document. I believe this tutorial created a structure that looks like this:

Now comes the hard part, trying to integrate the two. Based on the
documentation online, I believe the resulting combination will create a
structure that looks like this:

I started with updating the POM file by adding the dependencies for both
Spark Streaming and Ignite. However, the code doesn’t compile if you have
the dependencies for both of them.

Here is the full file
    <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi=                
    	<description>Count words using spark</description>                      
    	<!-- Apache Ignite dependencies -->                                       
                <!-- Spark Streaming dependencies -->                            

Since I would most likely eventually be wanting to integrate Ignite into an
existing Spark Streaming application, I decided to keep the spark-streaming
dependency and perform some tests. My next step was to combine the two
projects into one project and see if I could get my JUnit tests to run.
They did, so now it was time to begin the actual integration.

Since I want to add Ignite to the Spark Streaming project, I believe I need
to pass the Spark Streaming application the file to store in the Cache
through the Spark Worker. To do this, I added the JavaIgniteContext to the
same file I placed my JavaSparkContext. Here is the file before the change:
    package wordCount_test;                                                              
    import static org.junit.Assert.*;                                                    
    import java.io.File;                                                                 
    import org.apache.spark.api.java.JavaPairRDD;                                        
    import org.apache.spark.api.java.JavaRDD;                                            
    import org.apache.spark.api.java.JavaSparkContext;                                   
    import org.junit.After;                                                              
    import org.junit.Before;                                                             
    import org.junit.Test;                                                               
    import wordCount.SparkWordCount;                                                     
    public class TestSparkWordCount {                                                    
    	JavaSparkContext jsc;                                                               
    	File txtFile;                                                                       
    	public void setUp() throws Exception {                                              
    		jsc = new JavaSparkContext("local[2]", "testSparkWordCount");                      
    		txtFile = new File("AIW_WordCount");                                               
    	public void tearDown() throws Exception {                                           
    		jsc = null;                                                                        
    	public void testInit() {                                                            
    	public void test() {                                                                
    		SparkWordCount streamWords = new SparkWordCount();                                 
    		try {                                                                              
    			JavaRDD<String> textFile = jsc                                              
    			JavaPairRDD<String, Integer> wordCounts = streamWords                       
    		} catch (InterruptedException e) {                                                 

The problem appears when I try to add in the JavaIgniteContext whose
dependencies are all found in the ignite-spark dependency. Since it appears
that I can’t have them both, and I know my existing application will need
the Spark-Streaming dependency, I’m not sure how to proceed. Do you have
any suggestions?
View raw message