spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jun Feng Liu <liuj...@cn.ibm.com>
Subject Re: Tachyon in Spark
Date Tue, 16 Dec 2014 01:23:03 GMT

Thanks  the response. I got the point - sounds like todays Spark linage
dose not push to Tachyon linage.  Would be good to see how it works.

Jun Feng Liu.



                                                                           
             Haoyuan Li                                                    
             <haoyuan.li@gmail                                             
             .com>                                                      To 
                                       Jun Feng Liu/China/IBM@IBMCN,       
             2014-12-13 00:17                                           cc 
                                       Reynold Xin <rxin@databricks.com>,  
                                       Andrew Ash <andrew@andrewash.com>,  
                                       "dev@spark.apache.org"              
                                       <dev@spark.apache.org>              
                                                                   Subject 
                                       Re: Tachyon in Spark                
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           




Junfeng, by off the heap solution, did you mean "rdd.persist(OFF_HEAP)"?
That feature is different from the lineage feature. You can use this
feature (rdd.persist(OFF_HEAP)) now for any Spark version later than 1.0.0
with Tachyon without a problem.

Regarding Reynold's last email, those are good points. Tachyon had provided
this a while ago. We are working on enhancing this feature and the
integration part with Spark.

Thanks,

Haoyuan

On Fri, Dec 12, 2014 at 5:06 AM, Jun Feng Liu <liujunf@cn.ibm.com> wrote:
>
> I think the linage is the key feature of tachyon to reproduce the RDD
when
> any error happen. Otherwise, there have to be some data replica among
> tachyon nodes to ensure the data redundancy for fault tolerant - I think
> tachyon is avoiding to go to this path. Dose it mean the off-heap
solution
> is not ready yet if tachyon linage dose not work right now?
>
> Best Regards
>
>
> *Jun Feng Liu*
> IBM China Systems & Technology Laboratory in Beijing
>
>   ------------------------------
>  [image: 2D barcode - encoded with contact information] *Phone:
*86-10-82452683
>
> * E-mail:* *liujunf@cn.ibm.com* <liujunf@cn.ibm.com>
> [image: IBM]
>
> BLD 28,ZGC Software Park
> No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
> China
>
>
>
>
>
>  *Reynold Xin <rxin@databricks.com <rxin@databricks.com>>*
>
> 2014/12/12 10:22
>   To
> Andrew Ash <andrew@andrewash.com>,
> cc
> Jun Feng Liu/China/IBM@IBMCN, "dev@spark.apache.org"
<dev@spark.apache.org
> >
> Subject
> Re: Tachyon in Spark
>
>
>
>
> Actually HY emailed me offline about this and this is supported in the
> latest version of Tachyon. It is a hard problem to push this into
storage;
> need to think about how to handle isolation, resource allocation, etc.
>
>
>
https://github.com/amplab/tachyon/blob/master/core/src/main/java/tachyon/master/Dependency.java

>
> On Thu, Dec 11, 2014 at 3:54 PM, Reynold Xin <rxin@databricks.com> wrote:
>
> > I don't think the lineage thing is even turned on in Tachyon - it was
> > mostly a research prototype, so I don't think it'd make sense for us to
> use
> > that.
> >
> >
> > On Thu, Dec 11, 2014 at 3:51 PM, Andrew Ash <andrew@andrewash.com>
> wrote:
> >
> >> I'm interested in understanding this as well.  One of the main ways
> >> Tachyon
> >> is supposed to realize performance gains without sacrificing
durability
> is
> >> by storing the lineage of data rather than full copies of it (similar
to
> >> Spark).  But if Spark isn't sending lineage information into Tachyon,
> then
> >> I'm not sure how this isn't a durability concern.
> >>
> >> On Wed, Dec 10, 2014 at 5:47 AM, Jun Feng Liu <liujunf@cn.ibm.com>
> wrote:
> >>
> >> > Dose Spark today really leverage Tachyon linage to process data? It
> >> seems
> >> > like the application should call createDependency function in
> TachyonFS
> >> > to create a new linage node. But I did not find any place call that
in
> >> > Spark code. Did I missed anything?
> >> >
> >> > Best Regards
> >> >
> >> >
> >> > *Jun Feng Liu*
> >> > IBM China Systems & Technology Laboratory in Beijing
> >> >
> >> >   ------------------------------
> >> >  [image: 2D barcode - encoded with contact information] *Phone:
> >> *86-10-82452683
> >> >
> >> > * E-mail:* *liujunf@cn.ibm.com* <liujunf@cn.ibm.com>
> >> > [image: IBM]
> >> >
> >> > BLD 28,ZGC Software Park
> >> > No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
> >> > China
> >> >
> >> >
> >> >
> >> >
> >> >
> >>
> >
> >
>
>

--
Haoyuan Li
AMPLab, EECS, UC Berkeley
http://www.cs.berkeley.edu/~haoyuan/

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message