hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Deleted] (HIVE-14007) Replace ORC module with ORC release
Date Mon, 12 Dec 2016 19:35:58 GMT

     [ https://issues.apache.org/jira/browse/HIVE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Owen O'Malley updated HIVE-14007:
---------------------------------
    Comment: was deleted

(was: .bq
The other thing I think we need community wide clarity on before you rip out orc is how we’re
going to keep developing hive afterwards. Right now there’s a cyclic dependency. Hive ->
ORC -> Hive - because of a shared storage api.

There is agreement within Hive to release the storage-api independently from Hive.  That would
break the cycle and allow a non-cyclic release process. I'll file a Hive jira to do that work.
Avoiding have two copies of code makes the whole ecosystem stronger by making sure that fixes
get applied everywhere. I'd suggest leaving storage-api in the Hive source tree rather than
making its own git repository. 

.bq
There are features that touch all three. And it turns out these are more frequent than expected.


They come in waves. In the last three months, there have been 2 changes to storage-api.
Most of the patches are in either storage-api or ORC.  For example, HIVE-14453 only touches
ORC.

.bq
How do you propose to handle development and release of these features given the cyclic dependency?
How do you work out feature branches/ snapshots?

For changes that touch one or the other, you'd commit the relevant change and release either
storage-api or ORC and have a jira that updates the version in Hive. In the worst case, where
the change spreads among the three artifacts, you would:

* commit to storage-api & ORC
* release them
* upgrade the pom in Hive

.bq
If a successful feature commit requires sequential hive and orc releases, then that means
minimum several months before commit and that's not great. How will this be done?

No, ORC releases typically take 3 days. Storage API is much simpler and should also take 3
days. By being much smaller and more focused, they are much more nimble. Furthermore, the
two votes could completely overlap, so the total time to get the change into Hive would be
roughly 3 days. 

.bq
Looking over the PMC and committer lists in ORC it looks like many people working on ACID,
vectorization or llap will lose the ability to do what they are doing today with this change.

When we set up the ORC project, we were pretty inclusive in the committer list and we continue
to add new committers and PMC members. I'll take at the contributors to the Hive ORC module
to look for new committers.)

> Replace ORC module with ORC release
> -----------------------------------
>
>                 Key: HIVE-14007
>                 URL: https://issues.apache.org/jira/browse/HIVE-14007
>             Project: Hive
>          Issue Type: Bug
>          Components: ORC
>    Affects Versions: 2.2.0
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 2.2.0
>
>         Attachments: HIVE-14007.patch, HIVE-14007.patch, HIVE-14007.patch, HIVE-14007.patch
>
>
> This completes moving the core ORC reader & writer to the ORC project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message