spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yuanbo Liu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-25109) spark python should retry reading another datanode if the first one fails to connect
Date Tue, 14 Aug 2018 03:31:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-25109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yuanbo Liu updated SPARK-25109:
-------------------------------
    Description: 
We use this code to read parquet files from HDFS:

spark.read.parquet('xxx')

and get error as below:

!WeChatWorkScreenshot_86b5cccc-1d19-430a-a138-335e4bd3211c.png!

 

What we can get is that one of the replica block cannot be read for some reason, but spark
python doesn't try to read another replica which can be read successfully. So the application
fails after throwing exception.

When I use hadoop fs -text to read the file, I can get content correctly. It would be great
that spark python can retry reading another replica block instead of failing.

 

  was:
We use this code to read parquet files from HDFS:

spark.read.parquet('xxx')

and get error as below:

!WeChatWorkScreenshot_86b5cccc-1d19-430a-a138-335e4bd3211c.png!

 

What we can get is that one of the replica block cannot be read for some reason, but spark
python doesn't try to read another replica which can be read successfully. So the application
fails after throwing exception.

When I use hadoop fs -text to read the file, I can get content correctly. It would be great
if spark python can retry reading another replica block instead of failing.

 


> spark python should retry reading another datanode if the first one fails to connect
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-25109
>                 URL: https://issues.apache.org/jira/browse/SPARK-25109
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.3.1
>            Reporter: Yuanbo Liu
>            Priority: Major
>         Attachments: WeChatWorkScreenshot_86b5cccc-1d19-430a-a138-335e4bd3211c.png
>
>
> We use this code to read parquet files from HDFS:
> spark.read.parquet('xxx')
> and get error as below:
> !WeChatWorkScreenshot_86b5cccc-1d19-430a-a138-335e4bd3211c.png!
>  
> What we can get is that one of the replica block cannot be read for some reason, but
spark python doesn't try to read another replica which can be read successfully. So the application
fails after throwing exception.
> When I use hadoop fs -text to read the file, I can get content correctly. It would be
great that spark python can retry reading another replica block instead of failing.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message