crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wang Zhong (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-450) Adding ORC file format support in Crunch
Date Mon, 28 Jul 2014 20:09:42 GMT


Wang Zhong commented on CRUNCH-450:

Thanks for your review, Josh! For your questions:
1) I implemented OrcTypeFamily because the low-level file layout of ORC is distinguishable
enough to have its own type family. OrcStruct is also a very special Writable implementation,
which doesn't actually support write()/readFields(). In order to distinguish (and not to mix)
orc with other writable formats, I created a standalone type family for ORC.

2) I think it is a good idea to have a crunch-hive submodule for now. The Hive team is also
working on refactoring the Hive dependencies to make it more concise and modular (HIVE-7423).
I hope we can then move this orc support into Crunch trunk after we have a modularized dependency
for this component.

> Adding ORC file format support in Crunch
> ----------------------------------------
>                 Key: CRUNCH-450
>                 URL:
>             Project: Crunch
>          Issue Type: New Feature
>          Components: Core, IO
>            Reporter: Wang Zhong
>            Assignee: Josh Wills
>         Attachments: CRUNCH-450.patch
> This JIRA adds ORC file format support in Crunch by:
> --
> 1. Adding input source and output target for ORC
> 2. Adding a new type family - OrcTypeFamily to serialize / deserialize objects into OrcStruct
> 3. Supporting column pruning optimization

This message was sent by Atlassian JIRA

View raw message