spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Kim <>
Subject Re: new object store driver for Spark
Date Tue, 22 Mar 2016 15:01:55 GMT
Hi Gil,

Currently, our company uses S3 heavily for data storage. Can you further explain the benefits
of this in relation to S3 when the pending patch does come out? Also, I have heard of Swift
from others. Can you explain to me the pros and cons of Swift compared to HDFS? It can be
just a brief summary if you like or just guide me to material that will help me get a better


> On Mar 22, 2016, at 6:35 AM, Gil Vernik <> wrote:
> We recently released an object store connector for Spark.
> Currently this connector contains driver for the Swift based object store ( like SoftLayer
or any other Swift cluster ), but it can easily support additional object stores.
> There is a pending patch to support Amazon S3 object store. 
> The major highlight is that this connector doesn't create any temporary files  and so
it achieves very fast response times when Spark persist data in the object store.
> The new connector supports speculate mode and covers various failure scenarios ( like
two Spark tasks writing into same object, partial corrupted data due to run time exceptions
in Spark master, etc ).  It also covers
<>and other known issues.
> The detail algorithm for fault tolerance will be released very soon. For now, those who
interested, can view the implementation in the code itself.
> <>contains
all the details how to setup and use with Spark.
> A series of tests showed that the new connector obtains 70% improvements for write operations
from Spark to Swift and about 30% improvements for read operations from Swift into Spark (
comparing to the existing driver that Spark uses to integrate with objects stored in Swift).

> There is an ongoing work to add more coverage and fix some known bugs / limitations.
> All the best
> Gil

View raw message