Mailing-List: contact dev-help@spark.apache.org; run by ezmlm
Precedence: bulk
MIME-Version: 1.0
In-Reply-To: 
 <CAPh_B=ZwKDrhXzCJWPmDnLH6APb4ro+tWzAAG852XsUR0ycCXg@mail.gmail.com>
References: 
 <CABDss3j0Z3bXBRTh6knjkR=s=sqAYU3_5KA5xy=wtzYb5wmFRw@mail.gmail.com>
 <CAOYDGoA118JLfyKcfF6vOkGsm1hj3yqSGzYEpfCmNrGW6GRfDA@mail.gmail.com>
 <CAPh_B=YVnt4S4V8GCN7g0ObV7XDpXg_hN6yrfs5Ojkv8+p0obw@mail.gmail.com>
 <CAPh_B=ZwKDrhXzCJWPmDnLH6APb4ro+tWzAAG852XsUR0ycCXg@mail.gmail.com>
From: Sean Owen <sowen@cloudera.com>
Date: Tue, 21 Jul 2015 07:52:16 +0100
Message-ID: 
 <CAMAsSdLu+UcW8uJCvfXrFQO6azKnBAd6q+6Ga=J7n5wfs=HX5A@mail.gmail.com>
Subject: Re: Make off-heap store pluggable
To: Reynold Xin <rxin@databricks.com>
Cc: Prashant Sharma <scrapcodes@gmail.com>,
 Alexey Goncharuk <alexey.goncharuk@gmail.com>,
	"dev@spark.apache.org" <dev@spark.apache.org>
Content-Type: multipart/alternative; boundary=089e0141a0066b12a2051b5d196a

--089e0141a0066b12a2051b5d196a
Content-Type: text/plain; charset=UTF-8

(Related, not important comment: it would also be nice to separate out the
Tachyon dependency from core, as it's conceptually pluggable but is still
hard-coded into several places in the code, and a lot of the comments/docs
in the code.)

On Tue, Jul 21, 2015 at 5:40 AM, Reynold Xin <rxin@databricks.com> wrote:

> I sent it prematurely.
>
> They are already pluggable, or at least in the process to be more
> pluggable. In 1.4, instead of calling the external system's API directly,
> we added an API for that.  There is a patch to add support for HDFS
> in-memory cache.
>
> Somewhat orthogonal to this, longer term, I am not sure whether it makes
> sense to have the current off heap API, because there is no namespacing and
> the benefit to end users is actually not very substantial (at least I can
> think of much simpler ways to achieve exactly the same gains), and yet it
> introduces quite a bit of complexity to the codebase.
>
>
>
>
> On Mon, Jul 20, 2015 at 9:34 PM, Reynold Xin <rxin@databricks.com> wrote:
>
>> They are already pluggable.
>>
>>
>> On Mon, Jul 20, 2015 at 9:32 PM, Prashant Sharma <scrapcodes@gmail.com>
>> wrote:
>>
>>> +1 Looks like a nice idea(I do not see any harm). Would you like to work
>>> on the patch to support it ?
>>>
>>> Prashant Sharma
>>>
>>>
>>>
>>> On Tue, Jul 21, 2015 at 2:46 AM, Alexey Goncharuk <
>>> alexey.goncharuk@gmail.com> wrote:
>>>
>>>> Hello Spark community,
>>>>
>>>> I was looking through the code in order to understand better how RDD is
>>>> persisted to Tachyon off-heap filesystem. It looks like that the Tachyon
>>>> filesystem is hard-coded and there is no way to switch to another in-memory
>>>> filesystem. I think it would be great if the implementation of the
>>>> BlockManager and BlockStore would be able to plug in another filesystem.
>>>>
>>>> For example, Apache Ignite also has an implementation of in-memory
>>>> filesystem which can store data in on-heap and off-heap formats. It would
>>>> be great if it could integrate with Spark.
>>>>
>>>> I have filed a ticket in Jira:
>>>> https://issues.apache.org/jira/browse/SPARK-9203
>>>>
>>>> If it makes sense, I will be happy to contribute to it.
>>>>
>>>> Thoughts?
>>>>
>>>> -Alexey (Apache Ignite PMC)
>>>>
>>>
>>>
>>
>

--089e0141a0066b12a2051b5d196a
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">(Related, not important comment: it would also be nice to =
separate out the Tachyon dependency from core, as it&#39;s conceptually plu=
ggable but is still hard-coded into several places in the code, and a lot o=
f the comments/docs in the code.)</div><div class=3D"gmail_extra"><br><div =
class=3D"gmail_quote">On Tue, Jul 21, 2015 at 5:40 AM, Reynold Xin <span di=
r=3D"ltr">&lt;<a href=3D"mailto:rxin@databricks.com" target=3D"_blank">rxin=
@databricks.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" =
style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><di=
v dir=3D"ltr">I sent it prematurely.<div><br></div><div>They are already pl=
uggable, or at least in the process to be more pluggable. In 1.4, instead o=
f calling the external system&#39;s API directly, we added an API for that.=
=C2=A0 There is a patch to add support for HDFS in-memory cache.=C2=A0</div=
><div><br></div><div>Somewhat orthogonal to this, longer term, I am not sur=
e whether it makes sense to have the current off heap API, because there is=
 no namespacing and the benefit to end users is actually not very substanti=
al (at least I can think of much simpler ways to achieve exactly the same g=
ains), and yet it introduces quite a bit of complexity to the codebase.</di=
v><div><br></div><div><br></div><div><br></div></div><div class=3D"HOEnZb">=
<div class=3D"h5"><div class=3D"gmail_extra"><br><div class=3D"gmail_quote"=
>On Mon, Jul 20, 2015 at 9:34 PM, Reynold Xin <span dir=3D"ltr">&lt;<a href=
=3D"mailto:rxin@databricks.com" target=3D"_blank">rxin@databricks.com</a>&g=
t;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0=
 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">They ar=
e already pluggable.<div><br></div></div><div><div><div class=3D"gmail_extr=
a"><br><div class=3D"gmail_quote">On Mon, Jul 20, 2015 at 9:32 PM, Prashant=
 Sharma <span dir=3D"ltr">&lt;<a href=3D"mailto:scrapcodes@gmail.com" targe=
t=3D"_blank">scrapcodes@gmail.com</a>&gt;</span> wrote:<br><blockquote clas=
s=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pad=
ding-left:1ex"><div dir=3D"ltr">+1 Looks like a nice idea(I do not see any =
harm). Would you like to work on the patch to support it ?</div><div class=
=3D"gmail_extra"><span><font color=3D"#888888"><br clear=3D"all"><div><div>=
<div dir=3D"ltr">Prashant Sharma<div><br></div><div><br></div></div></div><=
/div></font></span><div><div>
<br><div class=3D"gmail_quote">On Tue, Jul 21, 2015 at 2:46 AM, Alexey Gonc=
haruk <span dir=3D"ltr">&lt;<a href=3D"mailto:alexey.goncharuk@gmail.com" t=
arget=3D"_blank">alexey.goncharuk@gmail.com</a>&gt;</span> wrote:<br><block=
quote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc=
 solid;padding-left:1ex"><div dir=3D"ltr"><span style=3D"font-size:12.80000=
01907349px">Hello=C2=A0</span><span style=3D"font-size:12.8000001907349px">=
<span>Spark</span></span><span style=3D"font-size:12.8000001907349px">=C2=
=A0community,</span><div style=3D"font-size:12.8000001907349px"><br></div><=
div style=3D"font-size:12.8000001907349px">I was looking through the code i=
n order to understand better how RDD is persisted to Tachyon off-heap files=
ystem. It looks like that the Tachyon filesystem is hard-coded and there is=
 no way to switch to another in-memory filesystem. I think it would be grea=
t if the implementation of the BlockManager and BlockStore would be able to=
 plug in another filesystem.</div><div style=3D"font-size:12.8000001907349p=
x"><br></div><div style=3D"font-size:12.8000001907349px">For example, Apach=
e Ignite also has an implementation of in-memory filesystem which can store=
 data in on-heap and off-heap formats. It would be great if it could integr=
ate with Spark.</div><div style=3D"font-size:12.8000001907349px"><br></div>=
<div style=3D"font-size:12.8000001907349px">I have filed a ticket in Jira:=
=C2=A0</div><div style=3D"font-size:12.8000001907349px"><a href=3D"https://=
issues.apache.org/jira/browse/SPARK-9203" target=3D"_blank">https://issues.=
apache.org/jira/browse/SPARK-9203</a></div><div style=3D"font-size:12.80000=
01907349px"><br></div><div style=3D"font-size:12.8000001907349px">If it mak=
es sense, I will be happy to contribute to it.</div><div style=3D"font-size=
:12.8000001907349px"><br></div><div style=3D"font-size:12.8000001907349px">=
Thoughts?</div><div style=3D"font-size:12.8000001907349px"><br></div><div s=
tyle=3D"font-size:12.8000001907349px">-Alexey (Apache Ignite PMC)</div></di=
v>
</blockquote></div><br></div></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--089e0141a0066b12a2051b5d196a--