spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: Reading AVRO from S3 - No parallelism
Date Thu, 27 Oct 2016 20:54:28 GMT
How big are your avro files?  We collapse many small files into a single
partition to eliminate scheduler overhead.  If you need explicit
parallelism you can also repartition.

On Thu, Oct 27, 2016 at 5:19 AM, Prithish <prithish@gmail.com> wrote:

> I am trying to read a bunch of AVRO files from a S3 folder using Spark
> 2.0. No matter how many executors I use or what configuration changes I
> make, the cluster doesn't seem to use all the executors. I am using the
> com.databricks.spark.avro library from databricks to read the AVRO.
>
> However, if I try the same on CSV files (same S3 folder, same
> configuration and cluster), it does use all executors.
>
> Is there something that I need to do to enable parallelism when using the
> AVRO databricks library?
>
> Thanks for your help.
>
>
>

Mime
View raw message