Return-Path: X-Original-To: apmail-spark-dev-archive@minotaur.apache.org Delivered-To: apmail-spark-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2A38318008 for ; Tue, 21 Jul 2015 06:52:48 +0000 (UTC) Received: (qmail 54278 invoked by uid 500); 21 Jul 2015 06:52:43 -0000 Delivered-To: apmail-spark-dev-archive@spark.apache.org Received: (qmail 54175 invoked by uid 500); 21 Jul 2015 06:52:43 -0000 Mailing-List: contact dev-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@spark.apache.org Received: (qmail 54164 invoked by uid 99); 21 Jul 2015 06:52:42 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Jul 2015 06:52:42 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 48798C0044 for ; Tue, 21 Jul 2015 06:52:42 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.98 X-Spam-Level: ** X-Spam-Status: No, score=2.98 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id TpjyFNvdjy5n for ; Tue, 21 Jul 2015 06:52:36 +0000 (UTC) Received: from mail-wi0-f182.google.com (mail-wi0-f182.google.com [209.85.212.182]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 34DF7203A3 for ; Tue, 21 Jul 2015 06:52:36 +0000 (UTC) Received: by wibud3 with SMTP id ud3so117235016wib.0 for ; Mon, 20 Jul 2015 23:52:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=GYMjyixNYZ6g/prPC5HoJcXnN0b7PFrcH+BjynKugao=; b=KKS31tM4KE/wFi2AQ3JbYX7VcxDQtR34IbiXVmHgb/cT7XnbHKV7nMItW6zeDx32ih LbDP8y/AVqY0YHc9F3jWraXzkegar3GU1NsCxLRPi3AF1Z9hKtZf/R26YZBEsvDDuGXO t1qhq/mIjbHJJGd7j/eWrz4tqSiR8N+U+pxPRoquJzvra94QO75/GQdZyklOf6ffP59P sOiPnKgZqwROhPwwUII+QrP1WZx0/vteD2/26JfuA4aXjX5W6uFWAnYn6ShuAVTDd5IQ PCHycwpEoWXZkpCHA+rlggd9aDlN1g0tao5xu6F8bu4UeuxXwqEYmTV/7srdYPZH3bO/ s9kQ== X-Gm-Message-State: ALoCoQlNCkTNrs+IIvzmJRlpKe4xesRZkXEocIkGSOjwBN44c2gqTVCDqYTDNYi9dsCfL5pfDEAq X-Received: by 10.194.174.194 with SMTP id bu2mr68632291wjc.76.1437461555901; Mon, 20 Jul 2015 23:52:35 -0700 (PDT) MIME-Version: 1.0 Received: by 10.27.39.195 with HTTP; Mon, 20 Jul 2015 23:52:16 -0700 (PDT) In-Reply-To: References: From: Sean Owen Date: Tue, 21 Jul 2015 07:52:16 +0100 Message-ID: Subject: Re: Make off-heap store pluggable To: Reynold Xin Cc: Prashant Sharma , Alexey Goncharuk , "dev@spark.apache.org" Content-Type: multipart/alternative; boundary=089e0141a0066b12a2051b5d196a --089e0141a0066b12a2051b5d196a Content-Type: text/plain; charset=UTF-8 (Related, not important comment: it would also be nice to separate out the Tachyon dependency from core, as it's conceptually pluggable but is still hard-coded into several places in the code, and a lot of the comments/docs in the code.) On Tue, Jul 21, 2015 at 5:40 AM, Reynold Xin wrote: > I sent it prematurely. > > They are already pluggable, or at least in the process to be more > pluggable. In 1.4, instead of calling the external system's API directly, > we added an API for that. There is a patch to add support for HDFS > in-memory cache. > > Somewhat orthogonal to this, longer term, I am not sure whether it makes > sense to have the current off heap API, because there is no namespacing and > the benefit to end users is actually not very substantial (at least I can > think of much simpler ways to achieve exactly the same gains), and yet it > introduces quite a bit of complexity to the codebase. > > > > > On Mon, Jul 20, 2015 at 9:34 PM, Reynold Xin wrote: > >> They are already pluggable. >> >> >> On Mon, Jul 20, 2015 at 9:32 PM, Prashant Sharma >> wrote: >> >>> +1 Looks like a nice idea(I do not see any harm). Would you like to work >>> on the patch to support it ? >>> >>> Prashant Sharma >>> >>> >>> >>> On Tue, Jul 21, 2015 at 2:46 AM, Alexey Goncharuk < >>> alexey.goncharuk@gmail.com> wrote: >>> >>>> Hello Spark community, >>>> >>>> I was looking through the code in order to understand better how RDD is >>>> persisted to Tachyon off-heap filesystem. It looks like that the Tachyon >>>> filesystem is hard-coded and there is no way to switch to another in-memory >>>> filesystem. I think it would be great if the implementation of the >>>> BlockManager and BlockStore would be able to plug in another filesystem. >>>> >>>> For example, Apache Ignite also has an implementation of in-memory >>>> filesystem which can store data in on-heap and off-heap formats. It would >>>> be great if it could integrate with Spark. >>>> >>>> I have filed a ticket in Jira: >>>> https://issues.apache.org/jira/browse/SPARK-9203 >>>> >>>> If it makes sense, I will be happy to contribute to it. >>>> >>>> Thoughts? >>>> >>>> -Alexey (Apache Ignite PMC) >>>> >>> >>> >> > --089e0141a0066b12a2051b5d196a Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
(Related, not important comment: it would also be nice to = separate out the Tachyon dependency from core, as it's conceptually plu= ggable but is still hard-coded into several places in the code, and a lot o= f the comments/docs in the code.)

On Tue, Jul 21, 2015 at 5:40 AM, Reynold Xin <rxin= @databricks.com> wrote:
I sent it prematurely.

They are already pl= uggable, or at least in the process to be more pluggable. In 1.4, instead o= f calling the external system's API directly, we added an API for that.= =C2=A0 There is a patch to add support for HDFS in-memory cache.=C2=A0

Somewhat orthogonal to this, longer term, I am not sur= e whether it makes sense to have the current off heap API, because there is= no namespacing and the benefit to end users is actually not very substanti= al (at least I can think of much simpler ways to achieve exactly the same g= ains), and yet it introduces quite a bit of complexity to the codebase.



=

On Mon, Jul 20, 2015 at 9:34 PM, Reynold Xin <rxin@databricks.com&g= t; wrote:
They ar= e already pluggable.


On Mon, Jul 20, 2015 at 9:32 PM, Prashant= Sharma <scrapcodes@gmail.com> wrote:
+1 Looks like a nice idea(I do not see any = harm). Would you like to work on the patch to support it ?

=
Prashant Sharma


<= /div>

On Tue, Jul 21, 2015 at 2:46 AM, Alexey Gonc= haruk <alexey.goncharuk@gmail.com> wrote:
Hello=C2=A0= Spark=C2= =A0community,

<= div style=3D"font-size:12.8000001907349px">I was looking through the code i= n order to understand better how RDD is persisted to Tachyon off-heap files= ystem. It looks like that the Tachyon filesystem is hard-coded and there is= no way to switch to another in-memory filesystem. I think it would be grea= t if the implementation of the BlockManager and BlockStore would be able to= plug in another filesystem.

For example, Apach= e Ignite also has an implementation of in-memory filesystem which can store= data in on-heap and off-heap formats. It would be great if it could integr= ate with Spark.

=
I have filed a ticket in Jira:= =C2=A0

If it mak= es sense, I will be happy to contribute to it.

= Thoughts?

-Alexey (Apache Ignite PMC)




--089e0141a0066b12a2051b5d196a--