spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Haoyuan Li <>
Subject Re: ephemeral storage level in spark ?
Date Sat, 05 Apr 2014 23:48:52 GMT
Hi Mridul,

Do you mean the scenario that different Spark applications need to read the
same raw data, which is stored in a remote cluster or machines. And the
goal is to load the remote raw data only once?


On Sat, Apr 5, 2014 at 4:30 PM, Mridul Muralidharan <>wrote:

> Hi,
>   We have a requirement to use a (potential) ephemeral storage, which
> is not within the VM, which is strongly tied to a worker node. So
> source of truth for a block would still be within spark; but to
> actually do computation, we would need to copy data to external device
> (where it might lie around for a while : so data locality really
> really helps if we can avoid a subsequent copy if it is already
> present on computations on same block again).
> I was wondering if the recently added storage level for tachyon would
> help in this case (note, tachyon wont help; just the storage level
> might).
> What sort of guarantees does it provide ? How extensible is it ? Or is
> it strongly tied to tachyon with only a generic name ?
> Thanks,
> Mridul

Haoyuan Li
Algorithms, Machines, People Lab, EECS, UC Berkeley

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message