spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Tolpeko <dmtolp...@gmail.com>
Subject Re: Migrate Relational to Distributed
Date Sat, 23 May 2015 18:54:20 GMT
Hi Brant,

Let me partially answer to your concerns: please follow a new open source
project PL/HQL (www.plhql.org) aimed at allowing you to reuse existing
logic and leverage existing skills at some extent, so you do not need to
rewrite everything to Scala/Java and can do this gradually. I hope it can
help.

Thanks,

Dmitry

On Sat, May 23, 2015 at 1:22 AM, Brant Seibert <brantseibert@hotmail.com>
wrote:

> Hi,  The healthcare industry can do wonderful things with Apache Spark.
> But,
> there is already a very large base of data and applications firmly rooted
> in
> the relational paradigm and they are resistent to change - stuck on Oracle.
>
> **
> QUESTION 1 - Migrate legacy relational data (plus new transactions) to
> distributed storage?
>
> DISCUSSION 1 - The primary advantage I see is not having to engage in the
> lengthy (1+ years) process of creating a relational data warehouse and
> cubes.  Just store the data in a distributed system and "analyze first" in
> memory with Spark.
>
> **
> QUESTION 2 - Will we have to re-write the enormous amount of logic that is
> already built for the old relational system?
>
> DISCUSSION 2 - If we move the data to distributed, can we simply run that
> existing relational logic as SparkSQL queries?  [existing SQL --> Spark
> Context --> Cassandra --> process in SparkSQL --> display in existing UI].
> Can we create an RDD that uses existing SQL?  Or do we need to rewrite all
> our SQL?
>
> **
> DATA SIZE - We are adding many new data sources to a system that already
> manages health care data for over a million people.  The number of rows may
> not be enormous right now compared to the advertising industry, for
> example,
> but the number of dimensions runs well into the thousands.  If we add to
> this, IoT data for each health care patient, that creates billions of
> events
> per day, and the number of rows then grows exponentially.  We would like to
> be prepared to handle that huge data scenario.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Migrate-Relational-to-Distributed-tp22999.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message