hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-14870) OracleStore: RawStore implementation optimized for Oracle
Date Wed, 05 Oct 2016 21:03:21 GMT

    [ https://issues.apache.org/jira/browse/HIVE-14870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15549918#comment-15549918
] 

Sergey Shelukhin edited comment on HIVE-14870 at 10/5/16 9:03 PM:
------------------------------------------------------------------

We only need very limited functionality compared to DN. The layer like this already exists
in ACID so I don't see why it cannot be reused and augmented. The only changes needed would
be the ability to replace some parts to optimize for Oracle (or other DBs), via some sort
of a plugin option (or even a switch statement) which will not be pretty but is imho preferable
to the alternatives.

As I see it, I would be merely -0 on the thing in itself - it's bad enough to have 2.5 SQL
"engines" (ORM, the one in acid, and directsql), to add the third and then another federation
thing that is not hidden on a lower level like the direct sql one. The direct sql one caused
(and will probably cause ;)) a few problems and special cases, simple as it is... plus the
confusion with failures-that-are-not-really-failures, failure to fall back, sudden unexplained
slowdowns when the fallback is successful, etc.).
There are probably all kinds of other issues; e.g. off the top of my head, how does this work
with upgrade scripts - would we need to create and maintain another set? Would scripts to
switch the schema between the old and the new always be the same, or would there need to be
a back and forth script for every version eventually (I don't think one would ever need that
but it is a possibility)? Etc.

However, my main meta concern is about the approach - what do we do if someone wants to have
an optimized MySqlEngine, or MsSqlEngine, AzureEngine, etc? They would totally c/p the Oracle
one, rewrite a few critical SQL queries, and submits a patch. That can quickly turn into a
maintenance nightmare.

It appears to me that the existing custom-SQL layer in ACID could be reused, if desired (or
used as inspiration) to make this store ANSI-ish (does it have any significant limitations
currently?). That way we can keep query optimizations in a plugin (or even a switch statement
if need be).
This also has an additional advantage of being able to deprecate and then ditch ORM altogether,
which would simplify things instead of making them more complex.

Another alternative path (that could be pursued in parallel) is making RawStore pluggable
so that such specific implementations could be used, while not being a supported part of Hive
codebase.



was (Author: sershe):
We only need very limited functionality compared to DN. The layer like this already exists
in ACID so I don't see why it cannot be reused and augmented. The only changes needed would
be the ability to replace some parts to optimize for Oracle (or other DBs), which will not
be pretty but is imho preferable to the alternatives.

As I see it, I would be merely -0 on the thing in itself - it's bad enough to have 2.5 SQL
"engines" (ORM, the one in acid, and directsql), to add the third and then another federation
thing that is not hidden on a lower level like the direct sql one. The direct sql one caused
(and will probably cause ;)) a few problems and special cases, simple as it is... plus the
confusion with failures-that-are-not-really-failures, failure to fall back, sudden unexplained
slowdowns when the fallback is successful, etc.).
There are probably all kinds of other issues; e.g. off the top of my head, how does this work
with upgrade scripts - would we need to create and maintain another set? Would scripts to
switch the schema between the old and the new always be the same, or would there need to be
a back and forth script for every version eventually (I don't think one would ever need that
but it is a possibility)? Etc.

However, my main meta concern is about the approach - what do we do if someone wants to have
an optimized MySqlEngine, or MsSqlEngine, AzureEngine, etc? They would totally c/p the Oracle
one, rewrite a few critical SQL queries, and submits a patch. That can quickly turn into a
maintenance nightmare.

It appears to me that the existing custom-SQL layer in ACID could be reused, if desired (or
used as inspiration) to make this store ANSI-ish (does it have any significant limitations
currently?). That way we can keep query optimizations in a plugin (or even a switch statement
if need be).
This also has an additional advantage of being able to deprecate and then ditch ORM altogether,
which would simplify things instead of making them more complex.

Another alternative path (that could be pursued in parallel) is making RawStore pluggable
so that such specific implementations could be used, while not being a supported part of Hive
codebase.


> OracleStore: RawStore implementation optimized for Oracle
> ---------------------------------------------------------
>
>                 Key: HIVE-14870
>                 URL: https://issues.apache.org/jira/browse/HIVE-14870
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Chris Drome
>            Assignee: Chris Drome
>         Attachments: OracleStoreDesignProposal.pdf
>
>
> The attached document is a proposal for a RawStore implementation which is optimized
for Oracle and replaces DataNucleus. The document outlines schema changes, OracleStore implementation
details, and performance tests against ObjectStore, ObjectStore+DirectSQL, and OracleStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message