crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-450) Adding ORC file format support in Crunch
Date Fri, 01 Aug 2014 22:01:39 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083025#comment-14083025
] 

Josh Wills commented on CRUNCH-450:
-----------------------------------

So a couple of questions as I peruse this:

1) How exactly does the tupleDerived stuff in the OrcTypeFamily work? Especially for collections
and maps?
2) Is there any sense in which I could (or would want to) execute a MapReduce job purely in
terms of OrcTypes for serialization? If so, could we add an integration test to that effect?
Or is the intent that the TypeFamily primarily exists for expressing IO operations to ORC
data files?


> Adding ORC file format support in Crunch
> ----------------------------------------
>
>                 Key: CRUNCH-450
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-450
>             Project: Crunch
>          Issue Type: New Feature
>          Components: Core, IO
>            Reporter: Zhong Wang
>            Assignee: Josh Wills
>             Fix For: 0.11.0
>
>         Attachments: CRUNCH-450-submodule.1.patch, CRUNCH-450-submodule.2.patch, CRUNCH-450-submodule.patch,
CRUNCH-450.patch
>
>
> This JIRA adds ORC file format support in Crunch by:
> --
> 1. Adding input source and output target for ORC
> 2. Adding a new type family - OrcTypeFamily to serialize / deserialize objects into OrcStruct
> 3. Supporting column pruning optimization



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message