ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stuart Macdonald (JIRA)" <j...@apache.org>
Subject [jira] [Created] (IGNITE-9108) Spark DataFrames With Cache Key and Value Objects
Date Fri, 27 Jul 2018 13:31:00 GMT
Stuart Macdonald created IGNITE-9108:
----------------------------------------

             Summary: Spark DataFrames With Cache Key and Value Objects
                 Key: IGNITE-9108
                 URL: https://issues.apache.org/jira/browse/IGNITE-9108
             Project: Ignite
          Issue Type: New Feature
          Components: spark
            Reporter: Stuart Macdonald


Add support for _key and _val columns within Ignite-provided Spark DataFrames, which represent
the cache key and value objects similar to the current _key/_val column semantics in Ignite
SQL.
 
If the cache key or value objects are standard SQL types (eg. String, Int, etc) they will
be represented as such in the DataFrame schema, otherwise they are represented as Binary types
encoded as either: 1. Ignite BinaryObjects, in which case we'd need to supply a Spark Encoder
implementation for BinaryObjects, eg:
 
{code:java}
IgniteSparkSession session = ...
Dataset<Row> dataFrame = ...
Dataset<MyValClass> valDataSet = dataFrame.select("_val_).as(session.binaryObjectEncoder(MyValClass.class))
{code}
Or 2. Kryo-serialised versions of the objects, eg:
 
{code:java}
Dataset<Row> dataFrame = ...
DataSet<MyValClass> dataSet = dataFrame.select("_val_).as(Encoders.kryo(MyValClass.class))
{code}
Option 1 would probably be more efficient but option 2 would be more idiomatic Spark.
 
The rationale behind this is the same as the Ignite SQL _key and _val columns: to allow access
to the full cache objects from a SQL context.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message