spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "mohamed imran (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-22526) Spark hangs while reading binary files from S3
Date Fri, 17 Nov 2017 17:05:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257224#comment-16257224
] 

mohamed imran edited comment on SPARK-22526 at 11/17/17 5:04 PM:
-----------------------------------------------------------------

[~srowen] I am processing inside the foreach loop. like this
example code:-
Dataframe.collect.foreach{x=>


filepath = x.getAs("filepath");

ziprdd = sc.binaryfiles(s"$filepath") ;// filename will be test.zip(example)

ziprdd.count;



}

i dont process Avro files. I am processing binary files which is compressed normal CSV files
from S3.

After some 100th or above 150th read, spark gets hangs while reading from S3.

Hope this info is suffice to clarify the issues. Let me know if you need anything else.


was (Author: imranece59):
[~srowen] I am processing inside the foreach loop. like this
example code:-
Dataframe.collect.foreach{x=>


filepath = x.getAs("filepath")

ziprdd = sc.binaryfiles(s"$filepath") // filename will be test.zip(example)

ziprdd.count



}

i dont process Avro files. I am processing binary files which is compressed normal CSV files
from S3.

After some 100th or above 150th read, spark gets hangs while reading from S3.

Hope this info is suffice to clarify the issues. Let me know if you need anything else.

> Spark hangs while reading binary files from S3
> ----------------------------------------------
>
>                 Key: SPARK-22526
>                 URL: https://issues.apache.org/jira/browse/SPARK-22526
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.2.0
>            Reporter: mohamed imran
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Hi,
> I am using Spark 2.2.0(recent version) to read binary files from S3. I use sc.binaryfiles
to read the files.
> It is working fine until some 100 file read but later it get hangs indefinitely from
5 up to 40 mins like Avro file read issue(it was fixed in the later releases)
> I tried setting the fs.s3a.connection.maximum to some maximum values but didn't help.
> And finally i ended up using the spark speculation parameter set which is again didnt
help much. 
> One thing Which I observed is that it is not closing the connection after every read
of binary files from the S3.
> example :- sc.binaryFiles("s3a://test/test123.zip")
> Please look into this major issue!      



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message