spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tassilo Klein <tjkl...@gmail.com>
Subject Re: SPARK_LOCAL_DIRS Issue
Date Wed, 11 Feb 2015 19:26:30 GMT
Thanks a lot. I will have a look at it.

On Wed, Feb 11, 2015 at 2:20 PM, Charles Feduke <charles.feduke@gmail.com>
wrote:

> Take a look at this:
>
> http://wiki.lustre.org/index.php/Running_Hadoop_with_Lustre
>
> Particularly: http://wiki.lustre.org/images/1/1b/Hadoop_wp_v0.4.2.pdf
> (linked from that article)
>
> to get a better idea of what your options are.
>
> If its possible to avoid writing to [any] disk I'd recommend that route,
> since that's the performance advantage Spark has over vanilla Hadoop.
>
> On Wed Feb 11 2015 at 2:10:36 PM Tassilo Klein <tjklein@gmail.com> wrote:
>
>> Thanks for the info. The file system in use is a Lustre file system.
>>
>> Best,
>>  Tassilo
>>
>> On Wed, Feb 11, 2015 at 12:15 PM, Charles Feduke <
>> charles.feduke@gmail.com> wrote:
>>
>>> A central location, such as NFS?
>>>
>>> If they are temporary for the purpose of further job processing you'll
>>> want to keep them local to the node in the cluster, i.e., in /tmp. If they
>>> are centralized you won't be able to take advantage of data locality and
>>> the central file store will become a bottleneck for further processing.
>>>
>>> If /tmp isn't an option because you want to be able to monitor the file
>>> outputs as they occur you can also use HDFS (assuming your Spark nodes are
>>> also HDFS members they will benefit from data locality).
>>>
>>> It looks like the problem you are seeing is that a lock cannot be
>>> acquired on the output file in the central file system.
>>>
>>> On Wed Feb 11 2015 at 11:55:55 AM TJ Klein <TJKlein@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Using Spark 1.2 I ran into issued setting SPARK_LOCAL_DIRS to a
>>>> different
>>>> path then local directory.
>>>>
>>>> On our cluster we have a folder for temporary files (in a central file
>>>> system), which is called /scratch.
>>>>
>>>> When setting SPARK_LOCAL_DIRS=/scratch/<node name>
>>>>
>>>> I get:
>>>>
>>>>  An error occurred while calling
>>>> z:org.apache.spark.api.python.PythonRDD.newAPIHadoopFile.
>>>> : org.apache.spark.SparkException: Job aborted due to stage failure:
>>>> Task 0
>>>> in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in
>>>> stage 0.0
>>>> (TID 3, XXXXXXX): java.io.IOException: Function not implemented
>>>> at sun.nio.ch.FileDispatcherImpl.lock0(Native Method)
>>>>         at sun.nio.ch.FileDispatcherImpl.lock(FileDispatcherImpl.java:
>>>> 91)
>>>>         at sun.nio.ch.FileChannelImpl.lock(FileChannelImpl.java:1022)
>>>>         at java.nio.channels.FileChannel.lock(FileChannel.java:1052)
>>>>         at org.apache.spark.util.Utils$.fetchFile(Utils.scala:379)
>>>>
>>>> Using SPARK_LOCAL_DIRS=/tmp, however, works perfectly. Any idea?
>>>>
>>>> Best,
>>>>  Tassilo
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://apache-spark-user-list.
>>>> 1001560.n3.nabble.com/SPARK-LOCAL-DIRS-Issue-tp21602.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>
>>>>
>>

Mime
View raw message