ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis Magda <dma...@apache.org>
Subject Re: Spark data frames integration merged
Date Fri, 05 Jan 2018 23:32:50 GMT

Excellent, please keep me in the loop and let me know once you achieve the next milestone
being ready for the production. This type of use cases help to spread a word about Ignite
which is really-really helpful!


> On Jan 5, 2018, at 12:27 AM, Revin Chalil <rchalil@expedia.com> wrote:
> Thanks Denis. I watched your recent 2 webinars and they were very helpful.
> I can definitely create a page explaining how (currently three) ignite shared-rdd caches
are shared across multiple spark streaming apps for data enrichment here at expedia, once
the solution is stabilized. We are not in production yet. I have enabled native persistence
and had some hiccups during our testing but is looking better today.
> We are currently working to optimize the join between incremental data and shared-rdd
dataframe in spark as there are several spark Apps and the total memory is limited. This part
does not have much to do with Ignite but mostly spark optimization, I believe. We do load
the entire ignite-cache (~50GB each) into spark executors and the cache is trimmed based on
the business rules, daily.
> We will keep in touch and thanks again for all the great work and help everyone.
> Revin
> From: Denis Magda <dmagda@apache.org>
> Date: Thursday, January 4, 2018 at 12:34 PM
> To: Revin Chalil <rchalil@expedia.com>
> Cc: "dev@ignite.apache.org" <dev@ignite.apache.org>
> Subject: Re: Spark data frames integration merged
> Revin, 
> As as side note, do you have a public article published or any other relevant material
that explains how Ignite is used at Expedia?
> You would help the community out a lot if such information is referenced from this page:
> https://ignite.apache.org/provenusecases.html <https://ignite.apache.org/provenusecases.html>
> —
> Denis
> On Jan 3, 2018, at 11:24 AM, Revin Chalil <rchalil@expedia.com <mailto:rchalil@expedia.com>>
> Thank you and this is great news. 
> We currently use the Ignite cache as a Reference dataset RDD in Spark, convert it into
a spark DataFrame and then join this DF with the incoming-data DF. I hope we can change this
3 step process to a single step with the Spark DF integration. If so, would index / affinitykeys
on the join columns help with performance? We currently do not have them defined on the Reference
dataset. Are there examples available joining ignite DF with Spark DF? Also, what is the best
way to get the latest executables with the IGNITE-3084 included? Thanks again. 
> On 12/29/17, 10:34 PM, "Nikolay Izhikov" <nizhikov.dev@gmail.com <mailto:nizhikov.dev@gmail.com>>
>    Thank you, guys.
>    Val, thanks for all reviews, advices and patience.
>    Anton, thanks for ignite wisdom you share with me.
>    Looking forward for next issues :)
>    P.S Happy New Year for all Ignite community!
>    В Пт, 29/12/2017 в 13:22 -0800, Valentin Kulichenko пишет:
> Igniters,
> Great news! We completed and merged first part of integration with
> Spark data frames [1]. It contains implementation of Spark data
> source which allows to use DataFrame API to query Ignite data, as
> well as join it with other data frames originated from different
> sources.
> Next planned steps are the following:
> - Implement custom execution strategy to avoid transferring data from
> Ignite to Spark when possible [2]. This should give serious
> performance improvement in cases when only Ignite tables participate
> in a query.
> - Implement ability to save a data frame into Ignite via
> DataFrameWrite API [3].
> [1] https://issues.apache.org/jira/browse/IGNITE-3084 <https://issues.apache.org/jira/browse/IGNITE-3084>
> [2] https://issues.apache.org/jira/browse/IGNITE-7077 <https://issues.apache.org/jira/browse/IGNITE-7077>
> [3] https://issues.apache.org/jira/browse/IGNITE-7337 <https://issues.apache.org/jira/browse/IGNITE-7337>
> Nikolay Izhikov, thanks for the contribution and for all the hard
> work!
> -Val

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message