cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Lalevée (Updated) (JIRA) <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-913) Add Hive support
Date Tue, 08 Nov 2011 12:50:53 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Nicolas Lalevée updated CASSANDRA-913:
--------------------------------------

    Attachment: CASSANDRA-913-r1199213.patch

I cannot reopen this issue, so I'll just comment.

As suggested by Jonathan in HIVE-1434, an hive/cassandra bridge may better fit here.

I have finally found the source of Brisk's implementation (https://github.com/riptano/hive).
The patch I am submitting here (CASSANDRA-913-r1199213.patch) is based on their work. So I
cannot grant any license here.

What I did on the original source:
* I changed the package names (for some classes, some package access was needed)
* add ASL2 headers for the ASF
* format the code according to cassandra standard
* change some logger from log4j and commons logging to slf4j
* it didn't handle well nulls in hive tables, I have fixed that for the little tests I did.

About the build, it needs hive jars in contrib/hive/lib. I don't know how to better setup
this since those jars are not available in the maven repo.

About runtime, I had a lot of trouble due to some conflict between the thrift library used
by hive and the one used by cassandra. hive 0.7 is using the 0.5, cassandra the 0.6. Cassandra
external table in hive could not be declared due to some NoSuchMethodException.
As far as I understand hive, hive need thrift at job runtime just for handling dynamic column
serialization. In my use case I didn't needed it so I did some hack: I remove every org.apache.thrift
class from hive-exec.jar. Then it works nicely (for my use case).

There were some tests in the github repo. They are Hive oriented. I'm too lazy to try to make
then work in cassandra's source tree.

With Hive 0.8, it will use thrift 0.7 (hopefully backward compatible with 0.6), and hive artifacts
will be published on the maven repository (HIVE-1095). So probably it will be best to wait
for easier integration in cassandra ?

                
> Add Hive support
> ----------------
>
>                 Key: CASSANDRA-913
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-913
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Contrib
>            Reporter: Jonathan Ellis
>              Labels: gsoc, gsoc2010
>         Attachments: CASSANDRA-913-r1199213.patch
>
>
> http://hadoop.apache.org/hive/ is a project that runs SQL queries against Hadoop map/reduce
clusters.  (For analytics; it is too high-latency to run applications against Hive directly).
 HIVE-705 added support for backends other than HDFS, with HBase as the first.  Cassandra
support should be doable too now.
> The Hive storage backends are described in http://wiki.apache.org/hadoop/Hive/StorageHandlers
and the HBase backend specifically in http://wiki.apache.org/hadoop/Hive/HBaseIntegration.
> I also note that John Sichi, author of the HBase backend, seems like a helpful guy and
I imagine would be totally cool with answering questions about implementation details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message