spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "William Benton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-4190) Allow users to provide transformation rules at JSON ingest
Date Sat, 01 Nov 2014 23:29:33 GMT

    [ https://issues.apache.org/jira/browse/SPARK-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193573#comment-14193573
] 

William Benton commented on SPARK-4190:
---------------------------------------

I'll take this, since I'm interested in working on it and it seems like a quick fix.  [~yhuai],
will you be willing to review a WIP PR sometime soon?

> Allow users to provide transformation rules at JSON ingest
> ----------------------------------------------------------
>
>                 Key: SPARK-4190
>                 URL: https://issues.apache.org/jira/browse/SPARK-4190
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.1.0, 1.2.0
>            Reporter: William Benton
>
> It would be great if it were possible to provide transformation rules (to be executed
within jsonRDD or jsonFile) so that users could 
>    (1) deal with JSON files that confound schema inference or are otherwise insufficiently
disciplined, or
>    (2) simply perform arbitrary object transformations at ingest before a schema is inferred.
> json4s, which Spark already uses, has nice interfaces for specifying transformations
as partial functions on objects and accessing nested structures via path expressions.  (We
might want to introduce an abstraction atop json4s for a public API, but the json4s API seems
like a good first step.)  There are some examples of these transformations at https://github.com/json4s/json4s
and at http://chapeau.freevariable.com/2014/10/fedmsg-and-spark.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message