spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gokula Krishnan D <email2...@gmail.com>
Subject Re: [Spark-Core] sc.textFile() explicit minPartitions did not work
Date Tue, 25 Jul 2017 12:21:33 GMT
In addition to that,

tried to read the same file with 3000 partitions but it used 3070
partitions. And took more time than previous please refer the attachment.

Thanks & Regards,
Gokula Krishnan* (Gokul)*

On Tue, Jul 25, 2017 at 8:15 AM, Gokula Krishnan D <email2dgk@gmail.com>
wrote:

> Hello All,
>
> I have a HDFS file with approx. *1.5 Billion records* with 500 Part files
> (258.2GB Size) and when I tried to execute the following I could see that
> it used 2290 tasks but it supposed to be 500 as like HDFS File, isn't it?
>
> val inputFile = <HDFS File>
> val inputRdd = sc.textFile(inputFile)
> inputRdd.count()
>
> I was hoping that I can do the same with the fewer partitions so tried the
> following
>
> val inputFile = <HDFS File>
> val inputrddnqew = sc.textFile(inputFile,500)
> inputRddNew.count()
>
> But still it used 2290 tasks.
>
> As per scala doc, it supposed use as like the HDFS file i.e 500.
>
> It would be great if you could throw some insight on this.
>
> Thanks & Regards,
> Gokula Krishnan* (Gokul)*
>

Mime
View raw message