spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Carman <scar...@coldlight.com>
Subject s3 vfs on Mesos Slaves
Date Tue, 12 May 2015 18:03:39 GMT
We have a small mesos cluster and these slaves need to have a vfs setup on them so that the
slaves can pull down the data they need from S3 when spark runs.

There doesn’t seem to be any obvious way online on how to do this or how easily accomplish
this. Does anyone have some best practices or some ideas about how to accomplish this?

An example stack trace when a job is ran on the mesos cluster…

Any idea how to get this going? Like somehow bootstrapping spark on run or something?

Thanks,
Steve


java.io.IOException: Unsupported scheme s3n for URI s3n://removed
        at com.coldlight.ccc.vfs.NeuronPath.toPath(NeuronPath.java:43)
        at com.coldlight.neuron.data.ClquetPartitionedData.makeInputStream(ClquetPartitionedData.java:465)
        at com.coldlight.neuron.data.ClquetPartitionedData.access$200(ClquetPartitionedData.java:42)
        at com.coldlight.neuron.data.ClquetPartitionedData$Iter.<init>(ClquetPartitionedData.java:330)
        at com.coldlight.neuron.data.ClquetPartitionedData.compute(ClquetPartitionedData.java:304)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
        at org.apache.spark.scheduler.Task.run(Task.scala:64)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
15/05/12 13:57:51 ERROR Executor: Exception in task 0.1 in stage 0.0 (TID 1)
java.lang.RuntimeException: java.io.IOException: Unsupported scheme s3n for URI s3n://removed
        at com.coldlight.neuron.data.ClquetPartitionedData.compute(ClquetPartitionedData.java:307)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
        at org.apache.spark.scheduler.Task.run(Task.scala:64)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Unsupported scheme s3n for URI s3n://removed
        at com.coldlight.ccc.vfs.NeuronPath.toPath(NeuronPath.java:43)
        at com.coldlight.neuron.data.ClquetPartitionedData.makeInputStream(ClquetPartitionedData.java:465)
        at com.coldlight.neuron.data.ClquetPartitionedData.access$200(ClquetPartitionedData.java:42)
        at com.coldlight.neuron.data.ClquetPartitionedData$Iter.<init>(ClquetPartitionedData.java:330)
        at com.coldlight.neuron.data.ClquetPartitionedData.compute(ClquetPartitionedData.java:304)
        ... 8 more

This e-mail is intended solely for the above-mentioned recipient and it may contain confidential
or privileged information. If you have received it in error, please notify us immediately
and delete the e-mail. You must not copy, distribute, disclose or take any action in reliance
on it. In addition, the contents of an attachment to this e-mail may contain software viruses
which could damage your own computer system. While ColdLight Solutions, LLC has taken every
reasonable precaution to minimize this risk, we cannot accept liability for any damage which
you sustain as a result of software viruses. You should perform your own virus checks before
opening the attachment.
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message