spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-17327) Throughput limitaion in spark standalone of simple task without calculation.
Date Wed, 31 Aug 2016 09:13:22 GMT

     [ https://issues.apache.org/jira/browse/SPARK-17327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sean Owen resolved SPARK-17327.
-------------------------------
          Resolution: Invalid
       Fix Version/s:     (was: 1.6.2)
    Target Version/s:   (was: 1.6.2)

Questions go to user@spark.apache.org, but you'd need to narrow this down further than "why
does this take a certain amount of time?"

Read https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark first.

> Throughput limitaion in spark standalone of simple task without calculation.
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-17327
>                 URL: https://issues.apache.org/jira/browse/SPARK-17327
>             Project: Spark
>          Issue Type: Question
>          Components: Java API, Windows
>    Affects Versions: 1.6.2
>         Environment: windows server 2008 R2 standard
>            Reporter: xiefeng
>              Labels: performance
>
> I install a spark standalone and run the spark cluster(one master and one worker) in
a windows 2008 server with 16cores and 24GB memory.
> I have done a simple test: Just create  a string RDD and simply return it. I use JMeter
to test throughput but the highest is around 35/sec. I think spark is powerful at distribute
calculation, but why the throughput is so limit in such simple test scenario only contains
simple task dispatch and no calculation?
> 1. In JMeter I test both 10 threads or 100 threads, there is little difference around
2-3/sec.
> 2. I test both cache/not cache the RDD, there is little difference around 1-2/sec. 
> 3. During the test, the cpu and memory is in low level.
> Below is my test code:
> @RestController
> public class SimpleTest {	
> 	@RequestMapping(value = "/SimpleTest", method = RequestMethod.GET)
> 	@ResponseBody
> 	public String testProcessTransaction() {
> 		return SparkShardTest.simpleRDDTest();
> 	}
> }
> final static Map<String, JavaRDD<String>> simpleRDDs = initSimpleRDDs();
> public static Map<String, JavaRDD<String>> initSimpleRDDs()
> 	{
> 		Map<String, JavaRDD<String>> result = new ConcurrentHashMap<String,
JavaRDD<String>>();
> 		JavaRDD<String> rddData = JavaSC.parallelize(data;
> 		rddData.cache().count();    //this cache will improve 1-2/sec
> 		result.put("MyRDD", rddData);
> 		return result;
> 	}
> 	
> 	public static String simpleRDDTest()
> 	{		
> 		JavaRDD<String> rddData = simpleRDDs.get("MyRDD");
> 		return rddData.first();
> 	}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message