spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Why is 1 executor overworked and other sit idle?
Date Tue, 22 Sep 2015 13:15:11 GMT
Have you tried using repartition to spread the load ?

Cheers

> On Sep 22, 2015, at 4:22 AM, Chirag Dewan <chirag.dewan@ericsson.com> wrote:
> 
> Hi,
>  
> I am using Spark to access around 300m rows in Cassandra.
>  
> My job is pretty simple as I am just mapping my row into a CSV format and saving it as
a text file.
>  
>  
> public String call(CassandraRow row)
>                                                                                 throws
Exception {
>                                                                 StringBuilder sb = new
StringBuilder();
>                                                                 sb.append(row.getString(10));
>                                                                 sb.append(",");
>                                                                 sb.append(row.getString(11));
>                                                                 sb.append(",");
>                                                                 sb.append(row.getString(8));
>                                                                 sb.append(",");
>                                                                 sb.append(row.getString(7));
>                                                                
>                                                                 return sb.toString();
> }
>  
> My map methods looks like this.
>  
> I am having a 3 node cluster. I observe that driver starts on Node A. And executors are
spawned on all 3 nodes. But the executor of Node B or C are doing all the tasks. It starts
a saveasTextFile job with 1 output partition and stores the RDDs in memory and also commits
the file on local file system.
>  
> This executor is using a lot of system memory and CPU while others are sitting idle.
>  
> Am I doing something wrong? Is my RDD correctly partitioned?
>  
> Thanks in advance.
>  
>  
> Chirag

Mime
View raw message