hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-3752) Add a non-sql API in hive to access data.
Date Fri, 30 Nov 2012 04:07:59 GMT

    [ https://issues.apache.org/jira/browse/HIVE-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507078#comment-13507078
] 

Namit Jain commented on HIVE-3752:
----------------------------------

[~appodictic], yes we did. 

While HCatalog is a neat project, there are a several reasons why an Hive input/format packaged
with Hive is better for Apache Giraph
*  HCatalog (trunk) unfortunately is not compatible with Hadoop-0.20
*  Hcatalog is much more complex than simply being an API to use Hive.  We only require a
small part of Hcatalog's functionality, so having only a portion of this functionality will
be easier to fix/update/maintain going forward
* Having an input/output format that is part of Hive will guarantee its compatibility with
Hive going forward

As an aside, Hcatalog could also use this new input/output format to interface with Hive,
potentially enabling a portion of its code to be simpler. 

In nutshell, HCatalog is a overkill for our simple usecase, and we want to avoid dependency
on as many systems as possible.
For a simple usecase like ours, enhancing hive seems like a much simpler option and easier
to maintain in the longer term.

ccing [~alangates], [~cwsteinbach]
                
> Add a non-sql API in hive to access data.
> -----------------------------------------
>
>                 Key: HIVE-3752
>                 URL: https://issues.apache.org/jira/browse/HIVE-3752
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Nitay Joffe
>
> We would like to add an input/output format for accessing Hive data in Hadoop directly
without having to use e.g. a transform. Using a transform
> means having to do a whole map-reduce step with its own disk accesses and its imposed
structure. It also means needing to have Hive be the base infrastructure for the entire system
being developed which is not the right fit as we only need a small part of it (access to the
data).
> So we propose adding an API level InputFormat and OutputFormat to Hive that will make
it trivially easy to select a table with partition spec and read from / write to it. We chose
this design to make it compatible with Hadoop so that existing systems that work with Hadoop's
IO API will just work out of the box.
> We need this system for the Giraph graph processing system (http://giraph.apache.org/)
as running graph jobs which read/write from Hive is a common use case.
> [~namitjain] [~aching] [~kevinwilfong] [~apresta]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message