crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-450) Adding ORC file format support in Crunch
Date Mon, 04 Aug 2014 19:18:13 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085138#comment-14085138
] 

Josh Wills commented on CRUNCH-450:
-----------------------------------

So read over this a bit more, and I don't think that supporting the Orc files requires adding
the OrcTypeFamily. As I read it, the Orcs serialization is primarily relying on the type class
of the PType instance, and delegating the actual deserialization logic to the ObjectInspector
(which is the right thing to do, I believe.) But then it seems to me that it would be possible
to take in _any_ PType instance (Avro or Writable), extract its type class and the type classes
of its sub-types, and then construct code that could read or write that data to an orcfile.
At the lowest level, OrcTypes are WritableTypes with a custom serialization/deserialization
protocol.

If that's not clear, I can whip up a version of the patch w/my preferred impl tomorrow.



> Adding ORC file format support in Crunch
> ----------------------------------------
>
>                 Key: CRUNCH-450
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-450
>             Project: Crunch
>          Issue Type: New Feature
>          Components: Core, IO
>            Reporter: Zhong Wang
>            Assignee: Josh Wills
>             Fix For: 0.11.0
>
>         Attachments: CRUNCH-450-submodule.1.patch, CRUNCH-450-submodule.2.patch, CRUNCH-450-submodule.patch,
CRUNCH-450.patch
>
>
> This JIRA adds ORC file format support in Crunch by:
> --
> 1. Adding input source and output target for ORC
> 2. Adding a new type family - OrcTypeFamily to serialize / deserialize objects into OrcStruct
> 3. Supporting column pruning optimization



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message