crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-450) Adding ORC file format support in Crunch
Date Mon, 28 Jul 2014 19:20:39 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076640#comment-14076640
] 

Josh Wills commented on CRUNCH-450:
-----------------------------------

Wow- that is a phenomenal amount of work- thanks for sending it along! A couple of high-level
questions:

1) What does OrcTypeFamily buy me? We've flirted with expanding the set of TypeFamilies from
Avro and Writable in the past, but have always been cautious about actually doing it b/c the
two-typefamily assumption is baked into so many things in the system. If everything in Orc
is compiled down to a type of Writable, would it still work as a collection of derived PTypes
on top of the WritableTypeFamily?
2) We also try to avoid large and complex external dependencies in crunch-core-- could we
move this into a new submodule, crunch-hive, which would contain all of our Hive dependency
stuff? I think there's more of it that we want to include (e.g., CRUNCH-340) and a few other
things I wouldn't mind having down the line, but I don't want to introduce the dependency
complexity for pipelines that don't actually make use of Hive stuff.

> Adding ORC file format support in Crunch
> ----------------------------------------
>
>                 Key: CRUNCH-450
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-450
>             Project: Crunch
>          Issue Type: New Feature
>          Components: Core, IO
>            Reporter: Wang Zhong
>            Assignee: Josh Wills
>         Attachments: CRUNCH-450.patch
>
>
> This JIRA adds ORC file format support in Crunch by:
> --
> 1. Adding input source and output target for ORC
> 2. Adding a new type family - OrcTypeFamily to serialize / deserialize objects into OrcStruct
> 3. Supporting column pruning optimization



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message