spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiangrui Meng (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-25994) SPIP: DataFrame-based graph queries and algorithms
Date Tue, 13 Nov 2018 00:08:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-25994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xiangrui Meng updated SPARK-25994:
----------------------------------
    Description: 
Copied from the SPIP doc:

{quote}
GraphX was one of the foundational pillars of the Spark project, and is the current graph
component. This reflects the importance of the graphs data model, which naturally pairs with
an important class of analytic function, the network or graph algorithm. 

However, GraphX is not actively maintained. It is based on RDDs, and cannot exploit Spark
2’s Catalyst query engine. GraphX is only available to Scala users.

GraphFrames is a Spark package, which implements DataFrame-based graph algorithms, and also
incorporates simple graph pattern matching with fixed length patterns (called “motifs”).
GraphFrames is based on DataFrames, but has a semantically weak graph data model (based on
untyped edges and vertices). The motif pattern matching facility is very limited by comparison
with the well-established Cypher language. 

The Property Graph data model has become quite widespread in recent years, and is the primary
focus of commercial graph data management and of graph data research, both for on-premises
and cloud data management. Many users of transactional graph databases also wish to work with
immutable graphs in Spark.

The idea is to define a Cypher-compatible Property Graph type based on DataFrames; to replace
GraphFrames querying with Cypher; to reimplement GraphX/GraphFrames algos on the PropertyGraph
type. 

To achieve this goal, a core subset of Cypher for Apache Spark (CAPS), reusing existing proven
designs and code, will be employed in Spark 3.0. This graph query processor, like CAPS, will
overlay and drive the SparkSQL Catalyst query engine, using the CAPS graph query planner.
{quote}

  was:[placeholder]


> SPIP: DataFrame-based graph queries and algorithms
> --------------------------------------------------
>
>                 Key: SPARK-25994
>                 URL: https://issues.apache.org/jira/browse/SPARK-25994
>             Project: Spark
>          Issue Type: New Feature
>          Components: GraphX
>    Affects Versions: 3.0.0
>            Reporter: Xiangrui Meng
>            Assignee: Martin Junghanns
>            Priority: Major
>
> Copied from the SPIP doc:
> {quote}
> GraphX was one of the foundational pillars of the Spark project, and is the current graph
component. This reflects the importance of the graphs data model, which naturally pairs with
an important class of analytic function, the network or graph algorithm. 
> However, GraphX is not actively maintained. It is based on RDDs, and cannot exploit Spark
2’s Catalyst query engine. GraphX is only available to Scala users.
> GraphFrames is a Spark package, which implements DataFrame-based graph algorithms, and
also incorporates simple graph pattern matching with fixed length patterns (called “motifs”).
GraphFrames is based on DataFrames, but has a semantically weak graph data model (based on
untyped edges and vertices). The motif pattern matching facility is very limited by comparison
with the well-established Cypher language. 
> The Property Graph data model has become quite widespread in recent years, and is the
primary focus of commercial graph data management and of graph data research, both for on-premises
and cloud data management. Many users of transactional graph databases also wish to work with
immutable graphs in Spark.
> The idea is to define a Cypher-compatible Property Graph type based on DataFrames; to
replace GraphFrames querying with Cypher; to reimplement GraphX/GraphFrames algos on the PropertyGraph
type. 
> To achieve this goal, a core subset of Cypher for Apache Spark (CAPS), reusing existing
proven designs and code, will be employed in Spark 3.0. This graph query processor, like CAPS,
will overlay and drive the SparkSQL Catalyst query engine, using the CAPS graph query planner.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message